Develop organisation-wide data architecture authority by mastering lakehouse design at scale, data governance, technology strategy, and the credibility to make hard data architecture decisions stick.
Organisation-Wide Data Architecture
A data architect is accountable for the coherence of the entire data platform - how data is ingested, stored, transformed, governed, and consumed across all domains. This requires a different mode of thinking: you are no longer optimising individual pipelines but shaping the architectural patterns, constraints, and guidelines that all data engineering work builds on.
Data Governance and Cataloguing
Governance is not compliance overhead - it is the engineering discipline that makes data trustworthy and useful at scale. A data architect owns the governance model: data ownership, classification, access control, lineage, cataloguing, and the processes that keep these things current as the organisation changes.
Lakehouse and Medallion Architecture at Scale
At the architect level you need to make the foundational platform decisions - storage formats, compute separation, layer semantics, partition strategies, table formats like Delta or Iceberg - and ensure they remain coherent as the platform grows. These decisions have multi-year consequences and require depth of understanding, not just familiarity.
Data Technology Strategy
A data architect evaluates, selects, and retires data technologies for the organisation. This means building structured evaluation frameworks, running credible proof-of-concept work, understanding vendor ecosystem health, and making technology choices that the organisation can sustain operationally over time.
Trusted Architectural Voice
Architectural authority in data is earned through a track record of sound decisions, clear communication, and genuine engagement with the teams who live with your architectural choices. The best data architects spend significant time with delivery teams, understanding constraints from the ground up.
Skills to Develop
Behaviours to Demonstrate
Develop an architectural position on how AI and ML workloads are first-class citizens of your data platform - feature stores, training pipelines, model registries, serving infrastructure, and monitoring - and document it as a platform standard.
Design the data governance framework for AI systems - what data is used for training, how consent and provenance are tracked, and how models are audited for data-related compliance requirements.
Evaluate the architectural implications of large language model integration in your data platform - retrieval-augmented generation patterns, embedding storage, vector databases, and how these interact with your existing data architecture.
Build a position on AI-assisted data catalogue population - accuracy requirements, human review workflows, and the governance process for AI-generated metadata - and present it to stakeholders.
Stay current on data-related AI regulation, including the EU AI Act, data residency requirements for AI training, and copyright questions around training data, as these directly shape architectural constraints.
Develop the organisational case for responsible AI data practices by synthesising regulatory requirements, technical risk, and business consequence into a credible architectural recommendation.
Data Management at Scale
The most practical and comprehensive treatment of enterprise data architecture, data mesh, and governance available - the architect's reference for scaled data management.
Designing Data-Intensive Applications
Deep foundational understanding of how data systems work - consistency models, replication, stream processing - that every data architect must be able to reason from under pressure.
Data Mesh
Written by the originator of the concept, this is the definitive account of data mesh principles, domain ownership, and self-serve infrastructure.
The Data Warehouse Toolkit
The canonical reference for dimensional modelling that remains essential for an architect evaluating analytical data design choices, even in modern lakehouse contexts.
Fundamentals of Software Architecture
Data architects benefit from understanding software architecture principles, patterns, and the soft skills of operating as an architect - this book covers the fundamentals clearly.
Data Governance: The Definitive Guide
A practical engineering-oriented treatment of data governance that connects governance requirements to concrete platform implementation.
Databricks Certified Data Engineer Professional
Comprehensive coverage of lakehouse architecture, Delta Lake patterns, and production data engineering - the certification validates platform depth at the architect level.
AWS Data Analytics Specialty
Architect-level understanding of the full AWS data services landscape - ingestion, storage, processing, and analytics - and how they combine into coherent data platforms.
Domain-Driven Design Fundamentals
DDD principles are central to data mesh domain decomposition - understanding how to model domain boundaries and data ownership requires this vocabulary.
Data Governance and Cataloguing
Practical governance tool knowledge combined with the processes that make governance operational rather than theoretical.
Review the full expectations for both roles to understand exactly what good looks like at each level.
→ Senior Data Engineer Archetype → Data Architect Archetype