Standard : Data ecosystems are reliable, governed, and designed to support decision-making and innovation

Purpose and Strategic Importance

As organisations deploy AI systems to support operational decisions, customer interactions, and strategic planning, the reliability and governance of the underlying data ecosystem becomes a safety-critical concern. AI models do not fail loudly — they produce outputs that appear plausible while being systematically biased, outdated, or incomplete. When those outputs drive hiring decisions, credit assessments, medical triage, or operational risk management, the consequences of poor data governance are not merely technical: they manifest as discriminatory outcomes, regulatory breaches, or decisions made on a false picture of reality. This standard establishes that data ecosystems must be governed with the same rigour applied to production software — with defined ownership, lineage tracking, quality SLAs, access controls, and retention policies that ensure data is trustworthy at the point of use.

Good data governance is frequently mischaracterised as a compliance burden — a set of controls imposed on innovation. In practice, well-governed data ecosystems accelerate innovation by making data discoverable, trustworthy, and reusable, reducing the time teams spend questioning data provenance or remediating quality issues before they can act. Federated governance models, informed by data mesh principles, allow domain teams to own and operate their data products within a framework of organisational standards, balancing autonomy with accountability. This standard supports that balance — enabling teams to move quickly while ensuring that the data underpinning AI-driven decisions is reliable, auditable, and fit for purpose across its full lifecycle.

Strategic Impact

AI models and automated decision systems operate on data that is demonstrably current, accurate, and complete, reducing the risk of systematic errors that erode trust in AI capabilities or expose the organisation to regulatory or reputational consequences.
Data lineage and ownership transparency enable rapid root-cause analysis when AI outputs are questioned, allowing teams to trace a prediction or recommendation back to its source data and identify whether the issue lies in the model, the features, or the underlying data quality.
Federated governance with clear data product ownership scales data management across the organisation without creating bottlenecks, enabling innovation teams to access and use data confidently without depending on a central data team for every integration.
Retention policies and access controls ensure that sensitive data used in AI training and inference is handled in accordance with privacy regulations, reducing legal and compliance risk and enabling the organisation to demonstrate accountability to regulators and customers.

Risks of Not Having This Standard

AI models trained or operated against ungoverned data produce decisions that are systematically biased by stale, incomplete, or unrepresentative data, with no mechanism to detect or remediate the issue until harm has already occurred.
The absence of data lineage makes it impossible to conduct meaningful audits of AI decision-making, exposing the organisation to regulatory non-compliance in jurisdictions where explainability and auditability of automated decisions are legally required.
Without defined data ownership and quality SLAs, data products degrade silently over time as source systems change, leading to model drift and increasing divergence between the world the model was trained on and the world it is currently operating in.
Inconsistent access controls create data security vulnerabilities where sensitive personal, financial, or operational data used in AI pipelines is accessible to parties who should not have it, increasing the blast radius of a data breach and the associated compliance exposure.
The lack of retention and lifecycle policies results in AI systems continuing to use outdated or legally expired data, creating compliance violations and producing decisions informed by information that the organisation is no longer permitted to hold or process.

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	Data ownership is unclear or contested, with no shared understanding of who is responsible for data quality, availability, or governance at a domain or organisational level.
Process & Governance	There are no formal data governance processes, lineage tracking, retention policies, or access control frameworks; data is managed informally and inconsistently across systems.
Technology & Tools	Data is stored in siloed systems with no unified catalogue, lineage tooling, or quality monitoring; access is controlled at a database level with no abstraction or policy layer.
Measurement & Metrics	There is no measurement of data ecosystem reliability, quality, or compliance; the extent of governance gaps is not understood and is only discovered when a failure or audit occurs.

Level 2 – Managed

Category	Description
People & Culture	Some teams have appointed informal data stewards or owners for high-value datasets, but governance responsibilities are not consistently defined or resourced across the organisation.
Process & Governance	Basic data retention policies exist for some regulated data categories, and access controls are applied to the most sensitive datasets, but these are not systematically enforced or reviewed.
Technology & Tools	Basic data cataloguing tools are in use for some domains, and some data lineage is documented manually, but there is no automated lineage capture or centralised governance platform.
Measurement & Metrics	Ad hoc data quality checks are run for specific projects or regulatory requirements, but there is no continuous quality monitoring or SLA tracking for data products at an ecosystem level.

Level 3 – Defined

Category	Description
People & Culture	Data product owners are formally appointed for all domains, with clear accountability for quality, lineage, and compliance; a data governance council provides cross-domain oversight and standards.
Process & Governance	Data governance policies — covering lineage, ownership, retention, access control, and quality SLAs — are documented, communicated, and applied consistently across all AI-designated data products.
Technology & Tools	Automated lineage capture tracks data from source to consumption, and a centralised or federated catalogue provides discoverability, ownership metadata, and quality status for all governed data products.
Measurement & Metrics	Data quality dimensions are measured continuously against defined SLAs for all governed data products, with dashboards that surface compliance status and trend data to domain owners and governance stakeholders.

Level 4 – Quantitatively Managed

Category	Description
People & Culture	Data governance maturity is a tracked organisational KPI, with domain teams assessed against a governance maturity model and supported to improve through training, tooling, and community of practice.
Process & Governance	Data governance processes are formally integrated into AI development and deployment lifecycle gates; no AI model reaches production without demonstrating that its data sources meet defined governance standards.
Technology & Tools	Policy-as-code frameworks enforce access controls, retention rules, and quality thresholds automatically as part of data pipeline execution, reducing reliance on manual compliance checks.
Measurement & Metrics	Data ecosystem reliability — including availability, freshness, schema stability, and quality SLA compliance — is quantitatively managed and reported alongside service reliability metrics for production AI systems.

Level 5 – Optimising

Category	Description
People & Culture	The organisation operates a mature data mesh model with federated governance, where domain teams are fully empowered to own and evolve their data products within a lightweight but consistent governance framework.
Process & Governance	Governance policies evolve continuously based on regulatory changes, AI risk assessments, and post-incident learning, with a fast-track process for updating standards in response to emerging data risks.
Technology & Tools	AI observability platforms monitor model performance in production against data ecosystem health signals, automatically flagging when data quality degradation is correlated with model performance decline.
Measurement & Metrics	End-to-end data ecosystem health metrics are used to drive board-level reporting on AI risk posture, demonstrating to regulators and stakeholders that data governance is actively managed and continuously improving.

Key Measures

Percentage of AI-designated data products with documented lineage from source to consumption, targeting full traceability for all data feeding production AI decision systems.
Data quality SLA compliance rate across governed data products, measured against defined thresholds for accuracy, completeness, timeliness, and consistency on a rolling monthly basis.
Mean time to identify and remediate a data quality issue affecting a production AI pipeline, used to assess the effectiveness of monitoring and ownership accountability.
Percentage of data products with a defined and enforced retention policy, tracking progress toward full compliance with applicable data protection and privacy regulations.
Number of AI incidents per quarter where root cause is attributable to a data governance failure — including stale data, schema drift, access control gaps, or quality SLA breach.
Access control compliance rate — the proportion of data products where access is demonstrably restricted to authorised consumers only, as verified through automated policy enforcement audits.