Standard : All AI decisions above defined risk thresholds require human review

Purpose and Strategic Importance

This standard requires that AI systems operating in decision-making contexts identify and route decisions that meet or exceed a defined risk threshold to a human reviewer before action is taken or the decision is communicated. It supports the policy of ensuring AI decisions are reviewable by humans by creating a systematic, proportionate control mechanism that preserves human agency for consequential outcomes without imposing review overhead on every AI output. The threshold must be defined by risk, not convenience.

Strategic Impact

Protects individuals and the organisation from the consequences of high-stakes AI errors by maintaining a human control layer
Creates a proportionate governance approach that applies review overhead where risk justifies it, not uniformly
Provides a critical data collection point where human reviewer decisions can be used to improve model quality over time
Builds user and regulatory trust by demonstrating that humans are accountable for consequential decisions, not algorithms
Enables the organisation to operate AI at scale in regulated domains by meeting the human oversight requirements of frameworks such as the EU AI Act

Risks of Not Having This Standard

High-stakes AI decisions with material consequences for individuals or the organisation are executed without human accountability
Systematic model errors in high-risk decision classes cause widespread harm before they are detected
Regulatory and legal liability increases when the organisation cannot demonstrate that a human reviewed a consequential decision
Users and affected parties lose trust when they discover that decisions affecting them were made entirely by an AI system
The organisation lacks the human review data needed to identify and correct the most consequential model failure modes

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Human review of AI decisions happens informally and inconsistently; whether a decision gets reviewed depends on individual awareness
Process & Governance	- No risk threshold policy; AI systems make decisions autonomously across all use cases without a formal review trigger
Technology & Tools	- AI systems have no built-in routing for high-risk outputs; all decisions are treated identically
Measurement & Metrics	- No tracking of which decisions are reviewed or what proportion of high-risk decisions involve human oversight

Level 2 – Managed

Category	Description
People & Culture	- Teams identify the decision types most likely to cause harm and establish informal norms for human review
Process & Governance	- A risk classification for AI use cases is documented; high-risk use cases are listed with a stated review requirement
Technology & Tools	- Manual review queues exist for high-risk AI outputs; flagging is partially automated based on simple rules
Measurement & Metrics	- Review completion rate for high-risk decisions is tracked; teams monitor the queue and report on backlogs

Level 3 – Defined

Category	Description
People & Culture	- Risk threshold definitions are agreed across AI, legal, risk, and business teams; reviewers are trained on the review criteria
Process & Governance	- A formal risk threshold framework defines when AI decisions require human review, who is qualified to review, and what constitutes a valid review outcome
Technology & Tools	- AI systems are built with review routing logic that automatically flags decisions meeting threshold criteria; reviewer tooling supports efficient and auditable review
Measurement & Metrics	- Human review rate, review completion time, and reviewer override rate are tracked per AI use case and reported in governance forums

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- Review quality as well as review quantity is measured; reviewers are calibrated against each other to ensure consistency
Process & Governance	- Review SLAs are defined per risk tier; breaches trigger escalation; threshold definitions are reviewed annually or when the model or context changes significantly
Technology & Tools	- Reviewer decision data is captured and linked back to AI outputs; disagreement between AI and reviewer is flagged for model improvement
Measurement & Metrics	- Reviewer override rate, review SLA compliance, and AI-reviewer agreement rate are measured and used to assess threshold calibration

Level 5 – Optimising

Category	Description
People & Culture	- Human review insights are systematically fed back into model improvement and threshold calibration processes
Process & Governance	- Threshold definitions are continuously refined based on actual outcomes from reviewed decisions and regulatory evolution
Technology & Tools	- Intelligent review routing prioritises the cases where human review adds the most value; AI-assisted pre-review tools surface relevant context to help reviewers make faster, better decisions
Measurement & Metrics	- Long-term outcome tracking links review decisions to downstream consequences, enabling evidence-based threshold calibration

Key Measures

Percentage of AI decisions meeting the defined risk threshold that received a human review before execution
Mean time to complete a human review per risk tier against defined SLA
Reviewer override rate (proportion of AI decisions changed by human review) per use case
AI-reviewer agreement rate as an indicator of model quality in reviewed decision classes
Number of threshold definition changes made in the last year based on review outcome evidence