Standard : Bias and fairness assessments are conducted at every model release

Purpose and Strategic Importance

This standard mandates that every AI model release includes a formal bias and fairness assessment that evaluates model performance across relevant demographic and contextual subgroups before the release reaches production. It supports the policy of managing bias as an ongoing operational concern by recognising that bias is not a one-time problem to be solved at the start of a project but a property that must be re-evaluated whenever the model, data, or deployment context changes. A model that was fair at launch can become unfair after retraining on shifted data or deployment in a new context.

Strategic Impact

Protects individuals from discriminatory outcomes caused by AI systems that perform unequally across demographic groups
Reduces legal and regulatory exposure in jurisdictions where automated decision-making is subject to equality legislation
Builds user trust, particularly among communities who have historically been harmed by biased algorithmic systems
Creates an organisational capability to identify and address bias systematically rather than reacting to incidents
Ensures that model improvement initiatives do not inadvertently introduce new fairness disparities while fixing other issues

Risks of Not Having This Standard

Biased model releases cause discriminatory outcomes at scale before the issue is surfaced through complaints or media scrutiny
Regulatory investigations and enforcement actions result from failure to demonstrate fairness due diligence
Reputational damage from bias incidents is disproportionately severe when it emerges that no formal assessment was conducted
Model improvements in aggregate performance mask growing disparities for subgroups that are not separately measured
Teams lack the data to distinguish bias introduced by model changes from bias present in the original system

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Bias is acknowledged as a concern in principle but not systematically addressed; assessment is informal and ad hoc
Process & Governance	- No fairness assessment requirement in the release process; bias checking is left to individual engineer discretion
Technology & Tools	- No dedicated fairness tooling; subgroup analysis is not performed as standard
Measurement & Metrics	- Only aggregate performance metrics are reported; subgroup performance disparities are invisible

Level 2 – Managed

Category	Description
People & Culture	- Teams identify relevant protected characteristics and demographic groups for their use case; fairness considerations are discussed at release reviews
Process & Governance	- A fairness checklist is added to the release process; teams must document which subgroups were evaluated and the results
Technology & Tools	- Basic subgroup performance analysis is conducted using standard disaggregation of evaluation metrics
Measurement & Metrics	- Performance metrics are reported separately for key demographic subgroups; disparities above a defined threshold are flagged

Level 3 – Defined

Category	Description
People & Culture	- Fairness assessment is owned by a named individual per release; affected communities are consulted in defining fairness criteria
Process & Governance	- A formal fairness assessment framework defines applicable fairness metrics (demographic parity, equalised odds, etc.) per use case type; releases that fail thresholds are blocked
Technology & Tools	- Dedicated fairness tooling (e.g. Fairlearn, AI Fairness 360) is integrated into the evaluation pipeline; reports are generated automatically
Measurement & Metrics	- Multiple fairness metrics are reported per release; trends across model versions are tracked to detect deterioration

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- Fairness is a shared engineering and product responsibility; teams are accountable for fairness performance in the same way as accuracy and reliability
Process & Governance	- Fairness thresholds are defined quantitatively per use case; threshold breaches trigger a formal review before any production deployment proceeds
Technology & Tools	- Counterfactual fairness, causal analysis, and intersectionality testing tools are applied to high-risk use cases
Measurement & Metrics	- Fairness metric trends, threshold breach rates, and time-to-remediation for fairness issues are tracked and reported in governance forums

Level 5 – Optimising

Category	Description
People & Culture	- Fairness learnings are shared across the organisation; teams contribute to the improvement of fairness standards based on deployment experience
Process & Governance	- Fairness standards are continuously updated based on regulatory developments, academic research, and community feedback
Technology & Tools	- Continuous fairness monitoring in production detects demographic performance drift and triggers alerts before disparities become material
Measurement & Metrics	- Long-term outcome equity data (not just model performance equity) is tracked to assess whether AI decisions are producing fair real-world results

Key Measures

Percentage of model releases accompanied by a formal bias and fairness assessment report
Number of releases blocked or modified due to fairness threshold breaches
Maximum demographic parity gap and equalised odds gap across key subgroups per released model
Time to remediate a fairness issue from identification to re-release
Rate of fairness-related incidents in production attributed to models that passed the pre-release fairness assessment (calibration metric)