Intermediate Platform Engineer

Designing platform components independently, administering Kubernetes at depth, embedding SRE practices, driving meaningful developer experience improvements, and growing junior engineers through mentoring.

📚

Growing into this role

Junior Platform Engineer to Intermediate Platform Engineer

→

🚀

Next step pathway

Intermediate Platform Engineer to Senior Platform Engineer

→

☍

Performance Calibration

Intermediate Platform Engineer - Performance Levels

↗

☑

Behaviour Tracker

Intermediate Platform Engineer - Performance Tracker

↗

Overview

As an Intermediate Platform Engineer, you work independently on well-understood platform problem domains and are beginning to lead the technical delivery of moderately complex infrastructure work. You design platform components, administer Kubernetes clusters, and drive reliability improvements - bringing both technical depth and growing awareness of how the platform serves engineering teams across the organisation.

You are expected to go beyond task delivery. You contribute to platform architecture decisions within your domain, mentor graduate and junior engineers, and raise the reliability and developer experience bar for the team. You are developing the breadth to engage meaningfully in infrastructure strategy conversations while retaining the depth to do the hard technical work yourself.

Key Responsibilities

Platform Component Design and Delivery

Design and build platform components independently including CI/CD pipelines, Kubernetes configurations, and IaC modules.
Make sound technical decisions within your domain, documenting reasoning and trade-offs clearly for review.
Review platform designs proposed by junior engineers and provide constructive, specific feedback.
Identify reliability and performance bottlenecks in existing platform components and lead improvement efforts.
Contribute to the team's standards for IaC structure, Kubernetes conventions, and platform engineering practices.

Reliability and Observability

Design and implement observability frameworks including metrics, logs, and traces for platform components you own.
Investigate incidents thoroughly, identifying root causes, implementing durable fixes, and writing detailed post-mortems.
Define and track SLOs for platform services you are responsible for, using them to prioritise reliability work.
Advocate for reliability as a shared responsibility across engineering teams, not just a platform team concern.

Developer Experience

Identify friction points in the developer workflow and lead efforts to reduce them through platform improvements.
Build and maintain internal developer tooling that makes engineering teams more productive.
Gather feedback from engineering teams on platform pain points and translate it into prioritised improvement work.
Document platform capabilities clearly so that engineering teams can self-serve effectively.

Mentoring and Technical Contribution

Provide regular, structured mentoring to graduate and junior engineers through pairing, code review, and coaching on platform practices.
Contribute to technical discussions and architecture reviews, sharing well-reasoned opinions and engaging constructively with alternatives.
Identify opportunities for platform improvement and propose them with supporting rationale and impact assessment.
Support incident response, diagnosing, communicating, and resolving platform outages or degradations with growing independence.

Role Specific

Independent Platform Design

Design and deliver complete platform components independently, making appropriate technical decisions and documenting trade-offs clearly for the team.

Reliability Engineering

Lead the design and implementation of observability, SLO definition, and incident response practices for your domain, making reliability a first-class engineering concern.

Developer Experience Ownership

Actively improve the experience of engineering teams building on the platform through tooling, documentation, and reducing workflow friction as a core responsibility alongside individual delivery.

Behaviours

Learning & Growth

Actively develops depth in platform engineering by studying cloud-native patterns, Kubernetes internals, and reliability engineering approaches beyond the immediate needs of current work.
Engages with the broader platform and SRE community through conferences, publications, and open source, bringing relevant insights back to the team.
Reflects on their own technical decisions after delivery, considering what worked well, what they would do differently, and what they would share with others.
Identifies the next level of technical challenge they need to take on and actively pursues it with their TTL or manager.
Develops knowledge of adjacent domains such as security engineering, data engineering, and software delivery to collaborate more effectively across team boundaries.
Seeks out feedback on their platform design decisions from senior engineers, not just confirmation that their approach is acceptable.

Delivery

Delivers moderately complex platform work independently, managing their own scope, estimating accurately, and flagging risk early.
Maintains delivery momentum while juggling mentoring responsibilities, managing their time deliberately to do both well.
Breaks down large infrastructure changes into reviewable increments and delivers them progressively with appropriate rollback plans.
Contributes meaningfully to sprint planning by providing well-reasoned estimates with explicit assumptions and flagging dependencies.
Drives changes to completion including post-deployment verification, monitoring setup, and runbook updates, not just code merged.
Identifies and manages delivery risk proactively, flagging to TTL when scope or risk is larger than initially understood.

Quality & Craft

Sets a visible quality standard for the team through their own work so others can understand what good platform engineering looks like.
Writes IaC that is modular, reusable, and well-documented so that others can safely extend and maintain it.
Applies consistent testing practices to infrastructure changes including plan reviews, integration tests, and pre-production validation.
Performs thorough self-review before requesting code review, checking security implications, cost impact, and operational burden.
Designs for operability, ensuring every component they build can be monitored, debugged, and recovered by an engineer unfamiliar with it.
Identifies systemic quality issues in the platform codebase and proposes structured improvements rather than one-off fixes.

Communication

Communicates platform decisions clearly, explaining not just what was decided but why, and what alternatives were considered.
Writes thorough PR descriptions for infrastructure changes that include context, risk assessment, and rollback procedures.
Provides code review feedback that is specific, actionable, and educational, helping junior engineers understand the reasoning.
Surfaces reliability risks and platform concerns to senior engineers and the TTL with clear evidence and impact assessment.
Communicates effectively with product engineering teams, translating platform concepts into terms relevant to their concerns.
Documents platform capabilities, operational procedures, and architectural decisions in accessible places for future engineers.

Collaboration

Builds strong working relationships with product engineering teams, understanding their workflow needs and pain points.
Collaborates actively with security engineering to ensure platform components meet security and compliance requirements.
Contributes substantively to technical discussions, sharing well-reasoned opinions while remaining genuinely open to better ideas.
Invests time in mentoring junior engineers as a core part of the role, not an optional extra.
Works across team boundaries on shared infrastructure components, building trust and establishing clear ownership.
Facilitates knowledge sharing through runbook writing, internal guides, and capturing learnings from incidents.

Ownership

Takes full ownership of the platform components assigned to them, understanding them deeply and maintaining them proactively.
Responds to platform incidents with urgency, investigating, communicating, and resolving with minimal escalation needed.
Advocates for the health of the platform by raising concerns about technical debt, fragile infrastructure, and reliability risks before they become incidents.
Follows through completely on delivery commitments including monitoring setup, documentation, and knowledge transfer.
Maintains awareness of the cost implications of their infrastructure decisions and actively looks for optimisation opportunities.
Acknowledges and learns openly from technical mistakes, sharing root cause analysis with the wider team where appropriate.

Technical Foundation

Demonstrates strong Terraform and Kubernetes capability applied consistently in production-quality platform work.
Designs infrastructure with appropriate rigour, applying cloud-native patterns with clear documented reasoning and security considerations.
Builds and operates CI/CD pipelines that are reliable, fast, and maintainable by engineering teams without platform team intervention.
Implements observability solutions that provide genuine confidence in platform reliability and make debugging straightforward.
Understands cloud networking and security at sufficient depth to design secure, well-connected infrastructure.
Maintains awareness of the broader platform architecture and how their work fits within it and serves engineering teams.
Keeps up with evolution in the team's tooling and cloud platform services, evaluating new capabilities and adopting them where they add clear value.

Skills

Proficient Terraform or equivalent IaC tooling for designing and managing cloud infrastructure across multiple environments.

Strong Kubernetes administration including workload management, RBAC, networking policies, and cluster troubleshooting.

Practical experience with CI/CD platform design including pipeline architecture, caching strategies, and build optimisation.

Working knowledge of cloud platform services across compute, networking, storage, and managed services.

Experience implementing observability stacks including metrics, distributed tracing, and structured logging pipelines.

Understanding of SRE practices including SLO definition, error budgets, and reliability-driven prioritisation.

Growing familiarity with platform security practices including secrets management, supply chain security, and least-privilege IAM.

Ability to read and reason about infrastructure costs and identify optimisation opportunities.

AI AI & Automation Expectations Updated for the AI-augmented era

AI Augmented Delivery

Uses AI to accelerate IaC development by generating Terraform modules, Kubernetes manifests, and Helm chart configurations, then reviews every generated resource for correctness, security implications, and blast radius before applying.
Leverages AI for runbook generation and documentation, producing operational guides and troubleshooting playbooks, then validates accuracy against actual system behaviour.
Uses AI to explore alternative approaches to reliability problems such as load balancing strategies or autoscaling configurations, then evaluates options against real workload characteristics.
Teaches junior engineers how to use AI safely for infrastructure work, emphasising that AI-generated IaC must be reviewed as carefully as production code due to the blast radius of infrastructure mistakes.
Uses AI to help analyse incident timelines and draft post-mortem documents, then refines them with accurate technical detail and meaningful corrective actions.
Maintains critical awareness that AI is particularly prone to errors in security-sensitive IaC contexts including IAM policies, network rules, and secrets handling, and applies deliberate scrutiny to these areas.