Graduate Data Engineer | Role Archetype

Building foundational data engineering skills under close guidance, developing SQL and Python fundamentals, and growing awareness of data quality and pipeline principles.

🚀

Next step pathway

Graduate Data Engineer to Junior Data Engineer

→

☍

Performance Calibration

Graduate Data Engineer - Performance Levels

↗

☑

Behaviour Tracker

Graduate Data Engineer - Performance Tracker

↗

Overview

As a Graduate Data Engineer, you are at the start of your data engineering career. Your primary goal is to learn - the tools, the data landscape, the team's standards, and the fundamentals of building reliable data pipelines. You work under close guidance from senior engineers, delivering small, well-scoped tasks and building the habits that will underpin your career.

You are not expected to work independently on complex data problems yet. You are expected to ask questions, absorb feedback, apply it consistently, and demonstrate growing capability over time. The most important behaviours at this level are curiosity, rigour about data quality, and reliability.

Key Responsibilities

Learning and Development

Actively engage with onboarding materials, internal data documentation, and technical learning resources.
Pair regularly with senior data engineers to build understanding of the data platform and engineering practices.
Seek feedback proactively and apply it consistently to your work.
Build familiarity with the team's SQL dialects, Python conventions, orchestration tools, and data warehousing patterns.
Develop awareness of the data landscape - what systems exist, how data flows between them, and where the team's pipelines fit.

Delivery

Deliver small, clearly scoped data tasks with close guidance from a senior engineer.
Write SQL and Python that meets the team's quality and style standards with appropriate support.
Participate in code reviews - receiving feedback and beginning to review others' work with guidance.
Raise blockers quickly rather than remaining stuck independently.
Document changes to pipelines and schemas clearly so teammates can understand what changed and why.

Data Quality Awareness

Learn the team's approach to data validation and quality checks.
Apply basic data quality checks to any data changes you make, guided by a senior engineer.
Develop an understanding of the downstream impact of data issues on consumers and business teams.

Collaboration

Contribute actively in team ceremonies - stand-ups, retrospectives, and planning sessions.
Build positive working relationships with teammates, data analysts, and business stakeholders.
Communicate progress, questions, and blockers clearly and promptly.

Role Specific

SQL and Python Foundations

Build working proficiency in SQL for querying and transforming data, and Python for pipeline scripting, under the guidance of experienced engineers.

Pipeline and Platform Awareness

Develop conceptual understanding of how data pipelines are structured, orchestrated, and monitored by working alongside experienced engineers on real delivery tasks.

Data Quality Mindset

Begin developing an instinct for data quality - learning to question assumptions about data, check for nulls and duplicates, and understand the impact of bad data on downstream consumers.

Behaviours

Learning & Growth

Approaches every data task as an opportunity to learn, not just to complete.
Asks questions without hesitation and seeks to understand the "why" behind data modelling and pipeline decisions.
Applies feedback consistently and tracks personal development over time.
Reads widely - documentation, dbt guides, data engineering blogs - to build context and understanding of the discipline.
Reflects regularly on their own progress, identifying gaps and discussing them with their TTL or mentor.
Shows willingness to learn from data mistakes - a bad query, a miscounted aggregate - without defensiveness.
Seeks out pairing opportunities proactively rather than waiting to be invited.

Delivery

Completes assigned data tasks reliably within agreed timeframes with close guidance.
Raises blockers early rather than pushing through silently.
Takes quality seriously from the start, even on small pieces of pipeline work.
Follows the agreed development workflow - branching, committing, opening PRs - consistently and correctly.
Responds to review feedback promptly and addresses it thoroughly before requesting re-review.
Keeps task status up to date in the team's tracking tools so the team has an accurate picture of progress.
Makes incremental, reviewable commits with clear messages that describe what changed and why.

Quality & Craft

Writes SQL and Python that is readable and follows the team's style conventions with support from a senior engineer.
Begins writing basic data quality assertions or tests for their own changes, guided by a more experienced colleague.
Reads and understands test coverage for the pipeline areas they are working in, asking questions about gaps.
Follows the team's definition of done and checks their own work against it before requesting review.
Avoids submitting pipeline changes with known data issues or unresolved questions without prior discussion.
Develops an awareness of common data quality problems - nulls, duplicates, schema drift - and flags them when encountered.
Learns what good code review feedback on data transformations looks like by observing and receiving it consistently.

Communication

Provides clear, concise updates in stand-ups - what they worked on, what they plan to do, what is blocking them.
Writes PR descriptions and documentation that give reviewers enough context to understand the data changes.
Asks questions in writing when appropriate so that the answer can benefit the wider team.
Communicates learning needs honestly with their TTL and mentor.
Responds to messages and review comments promptly during working hours.
Summarises their understanding when given verbal instructions to confirm correct interpretation.
Escalates concerns about data quality or timelines to their TTL early rather than hoping the problem resolves itself.

Collaboration

Contributes positively to team energy and culture.
Communicates openly and asks for help when needed.
Respects the expertise of more experienced colleagues while building their own voice.
Participates actively in stand-ups, retrospectives, planning sessions, and team discussions.
Pairs with senior engineers willingly and engages during sessions rather than passively observing.
Offers help to teammates when capacity allows, even in small ways such as reviewing a query or sharing something recently learned.
Respects agreed team norms around working hours, communication channels, and collaboration tools.

Ownership

Takes responsibility for completing tasks they have committed to, rather than waiting to be chased.
Follows through on review actions and does not consider a task done until it has met all agreed criteria.
Flags uncertainty about a data requirement or approach rather than making an assumption that leads to rework.
Keeps their own task board updated so the team always has an accurate picture of progress.
Owns their learning plan and does not wait for opportunities to be handed to them.
Acknowledges mistakes openly, explains what happened, and focuses on what they will do differently next time.
Takes the initiative to read relevant data documentation before asking a question that is already answered.

Technical Foundation

Develops working SQL proficiency and applies it in delivered data tasks under guidance.
Uses Git competently for branching, committing, and raising pull requests as part of everyday work.
Reads and navigates existing pipeline code to understand context before making changes.
Begins to understand the team's testing and data validation approach and why data quality matters.
Learns the team's deployment and orchestration process at a conceptual level.
Builds familiarity with the team's development environment, tooling, and cloud data warehouse.
Understands the basic data architecture of the system they are working in well enough to make safe, localised changes.

Skills

Foundational SQL skills - SELECT, JOIN, aggregation, filtering - applied in a modern cloud data warehouse.

Basic Python scripting - variables, functions, loops, file I/O - for simple data processing tasks.

Basic understanding of version control (Git) and development workflows.

Ability to read and understand existing pipeline code with guidance.

Growing familiarity with a data orchestration tool (e.g. Airflow, dbt, Prefect).

Clear written and verbal communication skills.

Awareness of tabular data concepts - rows, columns, keys, relationships, and schemas.

AI AI & Automation Expectations Updated for the AI-augmented era

AI Augmented Delivery

Uses AI coding assistants (Copilot, Cursor, Claude) as a learning accelerant, not a shortcut - the goal is to understand what the generated SQL or Python is doing, not just to get it working.
Validates every piece of AI-generated SQL before running it against production or near-production data - treats AI output as a draft requiring review, not a finished answer.
Asks "why does this query work?" about AI-generated code, not just "does it return the right result?" - builds genuine understanding of the logic.
Uses AI to help understand unfamiliar data models, generate test data, and explain SQL patterns - with verification at each step.
Recognises that AI can confidently produce incorrect SQL - wrong joins, silent data loss from aggregations, incorrect filter logic - and develops instinct for spotting these failure modes.
Treats prompt engineering as a learnable skill, providing data schema context and constraints when asking AI to generate queries or pipeline stubs.