← Learning Pathways Learning Pathway

Intermediate Data Engineer to Senior Data Engineer

🕑 18-36 months Data Engineering

Lead the technical direction of the data platform, own data quality and reliability end-to-end, mentor your team, and begin shaping data strategy.

🎯 Focus Areas

Data Platform Architecture

Senior data engineers shape the architecture of the data platform, not just implement within it. This means making informed decisions about the medallion architecture layers, choosing between lakehouse and warehouse approaches, designing for schema evolution at scale, and understanding the long-term cost implications of architectural choices.

Data Reliability Engineering

Reliability in data means consumers can trust the data they use to make decisions. This goes beyond pipeline uptime to cover data freshness SLAs, quality SLOs, incident response, and the organisational processes that make data issues visible and fast to resolve. Own this end-to-end.

Technical Leadership in Data

Senior data engineers define what good looks like for the team - in pipeline design, testing standards, data modelling, and operational practice. This influence comes through code review, design reviews, standards documentation, and the informal daily work of making good engineering visible and valued.

Mentoring and Growing the Team

At the senior level, your most important multiplier is making the engineers around you better. This means structured mentoring, deliberate knowledge sharing, and investing in the growth of intermediate and junior engineers with the same seriousness you apply to technical problems.

Influencing Data Strategy

Senior data engineers have enough context to contribute to data strategy - which domains need better data, where the platform has architectural gaps, and what investment would have the highest impact on consumers. Start making these observations visible to data leadership.

Skills & Behaviours to Develop

Skills to Develop

  • Design a lakehouse architecture for a business domain from scratch, including layer definitions, storage format choices, partitioning strategy, and governance approach.
  • Define and implement data SLOs covering freshness, completeness, accuracy, and consistency, and build the monitoring infrastructure to track them.
  • Evaluate streaming versus batch processing for a given use case and make a documented architectural recommendation with supporting evidence.
  • Lead a significant data platform migration or re-architecture, including impact assessment, sequencing, consumer communication, and rollback planning.
  • Design a data mesh or federated data ownership model for a complex domain and facilitate alignment across multiple data-producing teams.
  • Build and operate a semantic layer or data catalogue entry that makes a domain's data discoverable and trustworthy for self-serve consumers.
  • Define technical interview criteria for data engineering roles and run structured technical assessments reliably.
  • Produce a written data platform strategy document for a domain, covering current state, target state, and the investment required to get there.

Behaviours to Demonstrate

  • Makes the reliability of data a team engineering concern rather than a personal heroic effort - builds systems and norms that catch and fix issues systematically.
  • Engages data consumers proactively to understand their reliability and quality needs rather than assuming what good enough looks like.
  • Surfaces architectural risks and trade-offs in design discussions rather than just validating decisions already made.
  • Creates reusable pipeline patterns and templates that raise the quality floor for the whole team.
  • Mentors intermediate engineers with structured goals and genuine investment in their growth trajectory.
  • Documents architectural decisions with enough context that engineers who were not in the room understand the reasoning.
  • Speaks credibly about data platform capabilities and limitations to non-technical stakeholders and product teams.
🛠 Hands-On Projects
1 Design and document a full medallion architecture for a business domain, getting it reviewed by peers and implementing at least two layers with production-quality quality checks.
2 Define data SLOs for a domain you own, build the monitoring infrastructure, and respond to a real SLO breach with a documented root cause and systemic fix.
3 Lead a data platform technical spike on a new tool or approach - streaming ingestion, a new warehouse, a data catalogue - and produce a recommendation that influences a real decision.
4 Mentor two intermediate engineers over a quarter with defined growth goals and a written retrospective at the end.
5 Build a data product for a high-value use case - complete with documentation, quality guarantees, and a consumer onboarding process.
6 Write a data platform strategy paper for your engineering leadership covering current state, gaps, and a prioritised investment recommendation.
AI Literacy for This Transition
AI for data platform intelligence and governance
1

Evaluate where AI and ML pipelines fit in your data platform architecture - how models are trained, served, versioned, and monitored using the same data infrastructure you own.

2

Use AI to accelerate data exploration and anomaly hypothesis generation, but establish team standards for validating AI-suggested findings before acting on them.

3

Develop your team's position on using AI coding tools for pipeline code - what types of transformation logic benefit from AI assistance and what requires more careful human authorship.

4

Explore AI-assisted data cataloguing and metadata generation as a way to accelerate discoverability - evaluate the accuracy of AI-generated descriptions before publishing them.

5

Build an understanding of how AI features in BI and analytics tools affect data consumer trust - when AI-generated insights are presented alongside your data, quality failures have amplified consequences.

6

Develop a governance position on what data can be sent to external AI tools for analysis - this is a data engineering concern as much as a security one.

📚 Recommended Reading

Data Management at Scale

Piethein Strengholt

The most practical treatment of data mesh, data governance, and scaled data architecture - directly applicable to the decisions a senior data engineer faces.

Designing Data-Intensive Applications

Martin Kleppmann

The definitive reference for distributed data systems that every senior data engineer must have read and must be able to reason from in architecture discussions.

The Data Warehouse Toolkit

Ralph Kimball and Margy Ross

Dimensional modelling remains the foundation of analytical data design and a senior engineer needs deep fluency in these patterns to evaluate and evolve data models.

Fundamentals of Data Engineering

Joe Reis and Matt Housley

Provides the conceptual framework for the full data engineering lifecycle that a senior engineer needs to reason about platform strategy.

Staff Engineer: Leadership Beyond the Management Track

Will Larson

The transition to senior level is as much about technical leadership and influence as it is about technical depth - this book is the clearest map of that territory.

🎓 Courses & Resources

Databricks Certified Data Engineer Professional

Databricks Academy

The professional certification demands a comprehensive understanding of lakehouse architecture, Delta Lake, and production data engineering at scale.

Data Mesh Fundamentals

Various / DataStax, Thoughtworks

Understanding the data mesh paradigm is essential for senior data engineers shaping how data ownership and architecture should evolve.

Streaming with Kafka and Flink

Confluent Developer

Senior engineers need to make informed decisions about when to introduce streaming - this builds the depth to do so confidently.

Cloud Data Architecture

A Cloud Guru

Platform-specific architecture knowledge across cloud-native data services is essential for senior-level platform design decisions.

📋 Role Archetypes

Review the full expectations for both roles to understand exactly what good looks like at each level.

→ Intermediate Data Engineer Archetype → Senior Data Engineer Archetype