← Learning Pathways Learning Pathway

Senior Data Engineer to Data Architect

🕑 24-48 months Data Engineering

Develop organisation-wide data architecture authority by mastering lakehouse design at scale, data governance, technology strategy, and the credibility to make hard data architecture decisions stick.

🎯 Focus Areas

Organisation-Wide Data Architecture

A data architect is accountable for the coherence of the entire data platform - how data is ingested, stored, transformed, governed, and consumed across all domains. This requires a different mode of thinking: you are no longer optimising individual pipelines but shaping the architectural patterns, constraints, and guidelines that all data engineering work builds on.

Data Governance and Cataloguing

Governance is not compliance overhead - it is the engineering discipline that makes data trustworthy and useful at scale. A data architect owns the governance model: data ownership, classification, access control, lineage, cataloguing, and the processes that keep these things current as the organisation changes.

Lakehouse and Medallion Architecture at Scale

At the architect level you need to make the foundational platform decisions - storage formats, compute separation, layer semantics, partition strategies, table formats like Delta or Iceberg - and ensure they remain coherent as the platform grows. These decisions have multi-year consequences and require depth of understanding, not just familiarity.

Data Technology Strategy

A data architect evaluates, selects, and retires data technologies for the organisation. This means building structured evaluation frameworks, running credible proof-of-concept work, understanding vendor ecosystem health, and making technology choices that the organisation can sustain operationally over time.

Trusted Architectural Voice

Architectural authority in data is earned through a track record of sound decisions, clear communication, and genuine engagement with the teams who live with your architectural choices. The best data architects spend significant time with delivery teams, understanding constraints from the ground up.

Skills & Behaviours to Develop

Skills to Develop

  • Design a complete data platform architecture for a complex organisation, covering ingestion, storage, transformation, governance, and consumption layers with documented trade-offs.
  • Build and maintain a data governance framework - ownership model, classification taxonomy, access patterns, lineage requirements - and drive its adoption across engineering and business domains.
  • Evaluate and recommend a data platform technology stack with structured criteria, covering capability, operability, cost, ecosystem, and strategic fit.
  • Design for data mesh architecture - defining domain boundaries, data product contracts, self-serve infrastructure requirements, and federated governance.
  • Produce C4-style architectural documentation for the data platform that serves both technical and executive audiences.
  • Lead an architectural review process for significant data platform changes and ensure it improves quality without becoming a bottleneck.
  • Build and communicate a multi-year data platform technology roadmap connected to business capability goals.
  • Design data architecture for regulatory compliance scenarios - GDPR, data residency, retention, and right to erasure - as a first-class engineering concern.

Behaviours to Demonstrate

  • Engages with data engineering teams and data consumers directly to understand the real problems before proposing architectural solutions.
  • Makes architectural trade-offs explicit and invites challenge rather than presenting recommendations as foregone conclusions.
  • Updates architectural positions when evidence changes - intellectual honesty about prior decisions that did not work as expected.
  • Builds relationships with data engineering, analytics, product, and compliance stakeholders as a core part of the job, not a distraction from it.
  • Produces architecture documentation that engineers and leaders actually use to make decisions.
  • Sponsors data platform experimentation - proof-of-concept work and architectural spikes - rather than relying purely on theoretical reasoning.
  • Communicates complex data platform trade-offs to non-technical executives without losing fidelity or resorting to hand-waving.
🛠 Hands-On Projects
1 Produce a current-state and target-state data architecture for your organisation, covering all platform layers, identifying architectural debt and risks, and presenting it to engineering leadership.
2 Design a data governance framework for a business domain, including ownership assignment, classification, access patterns, and lineage - and drive its implementation.
3 Run a structured evaluation of a strategic data platform technology - a new table format, a data catalogue, a streaming platform - producing a written recommendation that influences a real decision.
4 Design a data mesh domain for a business area, defining data product contracts, ownership boundaries, and the self-serve infrastructure requirements, and engage the affected teams in validation.
5 Build a data architecture technology radar - categorising the data platform tools and technologies in use by adoption recommendation - and present and socialise it across engineering.
6 Design the data architecture for a regulatory compliance scenario such as GDPR, documenting the data flows, classification, retention, and erasure implementation.
AI Literacy for This Transition
AI as a data platform component and governance challenge
1

Develop an architectural position on how AI and ML workloads are first-class citizens of your data platform - feature stores, training pipelines, model registries, serving infrastructure, and monitoring - and document it as a platform standard.

2

Design the data governance framework for AI systems - what data is used for training, how consent and provenance are tracked, and how models are audited for data-related compliance requirements.

3

Evaluate the architectural implications of large language model integration in your data platform - retrieval-augmented generation patterns, embedding storage, vector databases, and how these interact with your existing data architecture.

4

Build a position on AI-assisted data catalogue population - accuracy requirements, human review workflows, and the governance process for AI-generated metadata - and present it to stakeholders.

5

Stay current on data-related AI regulation, including the EU AI Act, data residency requirements for AI training, and copyright questions around training data, as these directly shape architectural constraints.

6

Develop the organisational case for responsible AI data practices by synthesising regulatory requirements, technical risk, and business consequence into a credible architectural recommendation.

📚 Recommended Reading

Data Management at Scale

Piethein Strengholt

The most practical and comprehensive treatment of enterprise data architecture, data mesh, and governance available - the architect's reference for scaled data management.

Designing Data-Intensive Applications

Martin Kleppmann

Deep foundational understanding of how data systems work - consistency models, replication, stream processing - that every data architect must be able to reason from under pressure.

Data Mesh

Zhamak Dehghani

Written by the originator of the concept, this is the definitive account of data mesh principles, domain ownership, and self-serve infrastructure.

The Data Warehouse Toolkit

Ralph Kimball and Margy Ross

The canonical reference for dimensional modelling that remains essential for an architect evaluating analytical data design choices, even in modern lakehouse contexts.

Fundamentals of Software Architecture

Neal Ford and Mark Richards

Data architects benefit from understanding software architecture principles, patterns, and the soft skills of operating as an architect - this book covers the fundamentals clearly.

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, and Jessi Ashdown

A practical engineering-oriented treatment of data governance that connects governance requirements to concrete platform implementation.

🎓 Courses & Resources

Databricks Certified Data Engineer Professional

Databricks Academy

Comprehensive coverage of lakehouse architecture, Delta Lake patterns, and production data engineering - the certification validates platform depth at the architect level.

AWS Data Analytics Specialty

A Cloud Guru

Architect-level understanding of the full AWS data services landscape - ingestion, storage, processing, and analytics - and how they combine into coherent data platforms.

Domain-Driven Design Fundamentals

Pluralsight

DDD principles are central to data mesh domain decomposition - understanding how to model domain boundaries and data ownership requires this vocabulary.

Data Governance and Cataloguing

Collibra University or Alation Academy

Practical governance tool knowledge combined with the processes that make governance operational rather than theoretical.

📋 Role Archetypes

Review the full expectations for both roles to understand exactly what good looks like at each level.

→ Senior Data Engineer Archetype → Data Architect Archetype