Intermediate Data Engineer

Designing and delivering pipelines independently, shaping data quality frameworks, driving performance improvements, and beginning to grow junior engineers through mentoring and technical leadership.

📚

Growing into this role

Junior Data Engineer to Intermediate Data Engineer

→

🚀

Next step pathway

Intermediate Data Engineer to Senior Data Engineer

→

☍

Performance Calibration

Intermediate Data Engineer - Performance Levels

↗

☑

Behaviour Tracker

Intermediate Data Engineer - Performance Tracker

↗

Overview

As an Intermediate Data Engineer, you work independently on well-understood problem domains and are beginning to lead the technical delivery of moderately complex data work. You design pipelines, shape data models, and drive data quality improvements - bringing both technical depth and growing awareness of the business context your systems serve.

You are expected to go beyond task delivery. You contribute to technical direction within your domain, mentor graduate and junior engineers, and raise the quality bar for the team. You are developing the breadth to engage meaningfully in architecture conversations while retaining the depth to do the hard technical work yourself.

Key Responsibilities

Pipeline Design and Delivery

Design and build data pipelines independently from ingestion through transformation to serving layers.
Make sound technical decisions within your domain, documenting reasoning clearly for review.
Review pipeline designs proposed by junior engineers and provide constructive, specific feedback.
Identify performance bottlenecks in existing pipelines and lead optimisation efforts.
Contribute to the team's standards for pipeline structure, data modelling conventions, and code quality.

Data Quality Leadership

Design and implement data quality frameworks including tests, assertions, monitoring, and alerting for your domain.
Investigate data quality incidents thoroughly, identifying root causes and implementing durable fixes.
Develop documentation and runbooks for data quality processes so that others can maintain them.
Advocate for data quality as a shared responsibility across the team, not just a task for one person.

Mentoring and Growing Others

Provide regular, structured mentoring to graduate and junior engineers through pairing, code review, and coaching.
Create space for junior engineers to take on stretching tasks with appropriate support.
Give feedback that is specific, actionable, and balanced, recognising effort as well as identifying improvement areas.
Help junior engineers develop good engineering habits early around testing, documentation, and clear communication.

Collaboration and Technical Contribution

Collaborate effectively with data analysts, business stakeholders, and platform engineers to understand requirements and deliver appropriate solutions.
Contribute to technical discussions and architecture reviews, sharing well-reasoned opinions and engaging constructively with alternatives.
Identify opportunities for technical improvement in the team's data platform and propose them with supporting rationale.
Support incident response for data issues, diagnosing, communicating, and resolving data outages or degradations.

Role Specific

Independent Pipeline Design

Design and deliver complete data pipeline solutions from ingestion through serving, making appropriate technical decisions and documenting trade-offs clearly.

Data Quality Frameworks

Lead the design and implementation of data quality testing and monitoring approaches within your domain, making data quality an engineering concern rather than an afterthought.

Technical Mentoring

Actively develop the capabilities of graduate and junior engineers through structured pairing, code review, and coaching - investing in team capability as a core responsibility alongside individual delivery.

Behaviours

Learning & Growth

Actively develops depth in data engineering by studying advanced SQL patterns, distributed compute, and data architecture approaches beyond the immediate needs of current work.
Engages with the broader data engineering community through conferences, publications, and open source, bringing relevant insights back to the team.
Reflects on their own technical decisions after delivery, considering what worked well, what they would do differently, and what they would share with others.
Identifies the next level of technical challenge they need to take on and actively pursues it with their TTL or manager.
Develops knowledge of adjacent domains such as analytics engineering, data governance, and platform engineering to collaborate more effectively across team boundaries.
Seeks out feedback on their technical decisions from senior engineers, not just confirmation that their approach is acceptable.

Delivery

Delivers moderately complex data pipeline work independently, managing their own scope, estimating accurately, and flagging risk early.
Maintains delivery momentum while juggling mentoring responsibilities, managing their time deliberately to do both well.
Breaks down large pipeline tasks into reviewable increments and delivers them progressively rather than in large single PRs.
Contributes meaningfully to sprint planning by providing well-reasoned estimates with explicit assumptions and flagging dependencies.
Drives tasks to completion including post-deployment verification, monitoring setup, and documentation, not just code merged.
Identifies and manages delivery risk proactively, flagging to TTL when scope is larger than initially understood.

Quality & Craft

Sets a visible quality standard for the team through their own work so others can understand what good looks like.
Writes data quality tests that are meaningful and catch real failure modes rather than providing cosmetic coverage.
Refactors proactively within their own delivery, improving existing code when passing through it rather than accumulating debt.
Performs thorough self-review before requesting code review, checking logic, edge cases, performance, and documentation.
Writes code that can be maintained by engineers other than themselves, designing for readability and long-term maintainability.
Identifies systemic quality issues in the codebase and proposes structured improvements rather than one-off fixes.
Champions good data modelling discipline through appropriate normalisation, clear naming conventions, and documented business rules.

Communication

Communicates technical decisions clearly, explaining not just what was decided but why, and what alternatives were considered.
Writes thorough PR descriptions that include context, testing evidence, and guidance for reviewers.
Provides code review feedback that is specific, actionable, and educational, helping junior engineers understand the reasoning.
Surfaces data quality risks and platform concerns to senior engineers and the TTL with clear evidence and impact assessment.
Communicates effectively with data analysts and stakeholders, translating technical concepts into terms relevant to their audience.
Documents important decisions, data model rationale, and pipeline design choices in accessible places for future engineers.

Collaboration

Builds strong working relationships with data analysts, understanding how data is consumed and what quality guarantees matter most.
Collaborates actively with platform engineers on the infrastructure and tooling that underpins the data platform.
Contributes substantively to technical discussions, sharing well-reasoned opinions while remaining genuinely open to better ideas.
Invests time in mentoring junior engineers as a core part of the role, not an optional extra.
Works across team boundaries where data flows connect multiple teams, building trust and establishing clear ownership.
Facilitates knowledge sharing through short technical talks, internal guides, and capturing learnings from incidents.

Ownership

Takes full ownership of the data pipelines and domains assigned to them, understanding them deeply and maintaining them proactively.
Responds to data quality incidents with urgency, investigating, communicating, and resolving with minimal escalation needed.
Advocates for the health of the data platform by raising concerns about technical debt, fragile pipelines, and risks before they become incidents.
Follows through completely on delivery commitments including monitoring setup, documentation, and knowledge transfer.
Takes responsibility for the quality of junior engineers output when supporting them, owning the mentoring relationship and not just the advice.
Acknowledges and learns openly from technical mistakes, sharing root cause analysis with the wider team where appropriate.

Technical Foundation

Demonstrates advanced SQL and Python capability applied consistently in production-quality pipeline work.
Designs data models with appropriate rigour, applying dimensional modelling or lakehouse patterns with clear documented reasoning.
Builds and operates orchestration pipelines that are observable, recoverable, and maintainable by others.
Implements data quality frameworks that provide genuine confidence in data reliability for downstream consumers.
Understands query performance at sufficient depth to diagnose and resolve warehouse-level performance problems.
Maintains awareness of the broader data platform architecture and how their work fits within it.
Keeps up with evolution in the team's tooling and platform, adapting practices as tools and patterns mature.

Skills

Advanced SQL including window functions, recursive CTEs, query optimisation, and explain plan analysis in a cloud data warehouse environment.

Proficient Python for pipeline development including packaging, testing, error handling, and async patterns.

Strong working knowledge of a data orchestration platform such as Airflow, dbt, or Prefect including DAG design and observability.

Practical experience with dimensional modelling and data vault or lakehouse patterns.

Experience designing and implementing data quality testing frameworks such as dbt tests or Great Expectations.

Ability to read and reason about query execution plans and identify performance improvement opportunities.

Growing familiarity with data platform architecture including medallion architecture or equivalent patterns.

Understanding of streaming vs batch trade-offs and when each is appropriate.

AI AI & Automation Expectations Updated for the AI-augmented era

AI Augmented Delivery

Uses AI to accelerate pipeline development by generating transformation logic, data quality test suites, and orchestration code while applying expert judgement to validate correctness and performance.
Leverages AI for documentation generation including schema documentation, pipeline lineage descriptions, and data dictionaries, then refines outputs to match actual data behaviour.
Uses AI to explore optimisation strategies for slow queries by generating alternative approaches and explain plan interpretations, then benchmarks options against real data.
Teaches junior engineers how to use AI effectively for data engineering tasks by providing context-rich prompts, validating outputs, and avoiding acceptance of plausible-looking but incorrect SQL.
Uses AI to help draft data quality test suites, then reviews generated tests critically to verify they would detect the actual failure modes they claim to cover.
Treats AI assistance as a productivity multiplier that requires expert oversight, not a replacement for deep understanding of data semantics, business rules, and platform behaviour.