Standard : All AI decisions above defined risk thresholds require human review
Purpose and Strategic Importance
This standard requires that AI systems operating in decision-making contexts identify and route decisions that meet or exceed a defined risk threshold to a human reviewer before action is taken or the decision is communicated. It supports the policy of ensuring AI decisions are reviewable by humans by creating a systematic, proportionate control mechanism that preserves human agency for consequential outcomes without imposing review overhead on every AI output. The threshold must be defined by risk, not convenience.
Strategic Impact
- Protects individuals and the organisation from the consequences of high-stakes AI errors by maintaining a human control layer
- Creates a proportionate governance approach that applies review overhead where risk justifies it, not uniformly
- Provides a critical data collection point where human reviewer decisions can be used to improve model quality over time
- Builds user and regulatory trust by demonstrating that humans are accountable for consequential decisions, not algorithms
- Enables the organisation to operate AI at scale in regulated domains by meeting the human oversight requirements of frameworks such as the EU AI Act
Risks of Not Having This Standard
- High-stakes AI decisions with material consequences for individuals or the organisation are executed without human accountability
- Systematic model errors in high-risk decision classes cause widespread harm before they are detected
- Regulatory and legal liability increases when the organisation cannot demonstrate that a human reviewed a consequential decision
- Users and affected parties lose trust when they discover that decisions affecting them were made entirely by an AI system
- The organisation lacks the human review data needed to identify and correct the most consequential model failure modes
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
- Human review of AI decisions happens informally and inconsistently; whether a decision gets reviewed depends on individual awareness |
| Process & Governance |
- No risk threshold policy; AI systems make decisions autonomously across all use cases without a formal review trigger |
| Technology & Tools |
- AI systems have no built-in routing for high-risk outputs; all decisions are treated identically |
| Measurement & Metrics |
- No tracking of which decisions are reviewed or what proportion of high-risk decisions involve human oversight |
Level 2 – Managed
| Category |
Description |
| People & Culture |
- Teams identify the decision types most likely to cause harm and establish informal norms for human review |
| Process & Governance |
- A risk classification for AI use cases is documented; high-risk use cases are listed with a stated review requirement |
| Technology & Tools |
- Manual review queues exist for high-risk AI outputs; flagging is partially automated based on simple rules |
| Measurement & Metrics |
- Review completion rate for high-risk decisions is tracked; teams monitor the queue and report on backlogs |
Level 3 – Defined
| Category |
Description |
| People & Culture |
- Risk threshold definitions are agreed across AI, legal, risk, and business teams; reviewers are trained on the review criteria |
| Process & Governance |
- A formal risk threshold framework defines when AI decisions require human review, who is qualified to review, and what constitutes a valid review outcome |
| Technology & Tools |
- AI systems are built with review routing logic that automatically flags decisions meeting threshold criteria; reviewer tooling supports efficient and auditable review |
| Measurement & Metrics |
- Human review rate, review completion time, and reviewer override rate are tracked per AI use case and reported in governance forums |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
- Review quality as well as review quantity is measured; reviewers are calibrated against each other to ensure consistency |
| Process & Governance |
- Review SLAs are defined per risk tier; breaches trigger escalation; threshold definitions are reviewed annually or when the model or context changes significantly |
| Technology & Tools |
- Reviewer decision data is captured and linked back to AI outputs; disagreement between AI and reviewer is flagged for model improvement |
| Measurement & Metrics |
- Reviewer override rate, review SLA compliance, and AI-reviewer agreement rate are measured and used to assess threshold calibration |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
- Human review insights are systematically fed back into model improvement and threshold calibration processes |
| Process & Governance |
- Threshold definitions are continuously refined based on actual outcomes from reviewed decisions and regulatory evolution |
| Technology & Tools |
- Intelligent review routing prioritises the cases where human review adds the most value; AI-assisted pre-review tools surface relevant context to help reviewers make faster, better decisions |
| Measurement & Metrics |
- Long-term outcome tracking links review decisions to downstream consequences, enabling evidence-based threshold calibration |
Key Measures
- Percentage of AI decisions meeting the defined risk threshold that received a human review before execution
- Mean time to complete a human review per risk tier against defined SLA
- Reviewer override rate (proportion of AI decisions changed by human review) per use case
- AI-reviewer agreement rate as an indicator of model quality in reviewed decision classes
- Number of threshold definition changes made in the last year based on review outcome evidence