Design and own platform components independently, implement SRE practices, improve developer experience, and begin mentoring graduate engineers.
Kubernetes Administration
Move from deploying applications in Kubernetes to administering the cluster itself. This means understanding node management, network policies, RBAC, admission controllers, resource quotas, and cluster upgrades. The intermediate platform engineer can diagnose cluster-level problems, not just application-level ones.
SRE Practices
Service level objectives are the contract between the platform and its consumers. Learn to define meaningful SLIs, set realistic SLOs, build alerting that pages for things that actually matter, and run blameless post-mortems that produce systemic improvements. SRE is a discipline - not a job title - that platform engineers own.
Observability Engineering
Observability is not metric collection - it is the ability to ask arbitrary questions about system behaviour using the signals you have. Build the skills to instrument systems meaningfully, design dashboards that surface real problems, and use distributed tracing to understand behaviour across service boundaries.
Developer Experience
The platform team's primary customer is other engineers. Intermediate platform engineers actively seek feedback from developer teams, identify friction in the development workflow, and implement targeted improvements. A platform that developers love to use is a platform that creates business value.
Platform Component Design
Design platform components - deployment templates, infrastructure modules, shared pipeline libraries - as products. This means stable APIs, backward compatibility, versioning, documentation, and enough flexibility for consumers to use them without forking.
Skills to Develop
Behaviours to Demonstrate
Use AI to generate Terraform and Kubernetes YAML boilerplate for components you are building, but review every security-sensitive attribute - IAM policies, network rules, secret handling - independently.
Experiment with AI for incident diagnosis by providing log excerpts and error messages and evaluating how reliably it identifies root causes versus plausible-sounding but incorrect hypotheses.
Use AI to help write runbooks and operational documentation from notes or code, treating the output as a first draft requiring expert review for accuracy and completeness.
Explore AI-assisted code review for infrastructure-as-code by asking AI to identify security misconfigurations, missing resource limits, or deviations from team standards in Terraform and Kubernetes manifests.
Evaluate AI coding tools from a platform security perspective - understand what code and secrets might be inadvertently included in prompts and establish team guidelines.
Use AI to accelerate writing developer-facing documentation for platform tools and services, recognising that accuracy is non-negotiable and must be verified before publishing.
Site Reliability Engineering
The foundational text for SRE practice - SLIs, SLOs, error budgets, on-call design, and incident management - that every intermediate platform engineer must have read.
The Site Reliability Workbook
The practical companion to the SRE book, with worked examples of implementing SRE practices in real organisations.
Kubernetes in Action
The deepest practical treatment of Kubernetes available - essential for moving from using the platform to administering it.
Observability Engineering
The definitive guide to modern observability - instrumentation, high-cardinality data, debugging in production - written by engineers who invented much of the practice.
Understanding the research connecting deployment practices, team culture, and business outcomes helps platform engineers make the case for developer experience investment.
Certified Kubernetes Administrator (CKA)
The CKA validates deep Kubernetes administration skills - the exam is hands-on and genuinely tests operational competence.
Prometheus and Grafana: Complete Monitoring Stack
Builds practical observability implementation skills that are immediately applicable to building platform monitoring infrastructure.
GitOps with ArgoCD
GitOps is becoming the standard deployment model for platform teams and this builds hands-on skills with the leading implementation.
Platform Engineering Fundamentals
Covers the platform-as-a-product mindset and developer experience principles that differentiate great platform teams from infrastructure teams with a different name.
Review the full expectations for both roles to understand exactly what good looks like at each level.
→ Junior Platform Engineer Archetype → Intermediate Platform Engineer Archetype