Standard : AI systems provide explainable outputs for high-stakes decisions
Purpose and Strategic Importance
This standard requires that AI systems involved in high-stakes decisions — those with material consequences for individuals, business operations, or safety — must produce outputs accompanied by an explanation that a human reviewer or affected party can understand and act upon. It supports the policy of designing for explainability and not just accuracy by making explainability a functional requirement, not an afterthought. A model with superior accuracy but no explainability is often less deployable in practice than a slightly less accurate model whose reasoning can be interrogated and defended.
Strategic Impact
- Enables human reviewers to make informed decisions about whether to accept, modify, or override AI outputs
- Supports regulatory compliance in jurisdictions that grant individuals the right to explanation for automated decisions
- Builds user and stakeholder trust by making AI reasoning visible, challengeable, and auditable
- Facilitates model debugging and improvement by exposing which features and patterns drive specific outcomes
- Reduces the liability exposure of deploying AI in consequential contexts by creating an accountable reasoning record
Risks of Not Having This Standard
- Human reviewers cannot meaningfully oversee AI decisions they cannot understand, reducing the value of the human-in-the-loop control
- Regulatory challenges succeed when the organisation cannot explain the basis of an automated decision to regulators or courts
- Users who are adversely affected by AI decisions cannot challenge them effectively without access to a clear explanation
- Model debugging is slow and imprecise because feature importance and decision pathways are opaque
- The organisation develops a culture of deferred accountability in which AI decisions are trusted without question because they cannot be interrogated
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
- Explainability is not considered a design requirement; models are evaluated purely on predictive performance |
| Process & Governance |
- No explainability requirement in the AI design or deployment process; black-box models are used without restriction |
| Technology & Tools |
- No explainability tooling; model reasoning is entirely opaque to operators and users |
| Measurement & Metrics |
- Explainability is not measured; there is no baseline for what level of explanation is available or usable |
Level 2 – Managed
| Category |
Description |
| People & Culture |
- Teams identify high-stakes use cases where explainability is expected by users or regulators |
| Process & Governance |
- A requirement to provide feature-level explanation for high-stakes AI decisions is added to design standards |
| Technology & Tools |
- SHAP or LIME is applied post-hoc to generate feature importance explanations for individual predictions |
| Measurement & Metrics |
- Availability of explanations for high-stakes decisions is tracked; the team reviews a sample of explanations for intelligibility |
Level 3 – Defined
| Category |
Description |
| People & Culture |
- Explainability is a design criterion evaluated at architecture review; the team includes user-facing explanation design in product specifications |
| Process & Governance |
- A tiered explainability standard defines the minimum explanation type required per decision risk tier (feature attribution, counterfactual, natural language summary) |
| Technology & Tools |
- Explainability methods are integrated into the inference pipeline; explanations are generated at prediction time and stored alongside outputs |
| Measurement & Metrics |
- Explanation coverage rate (proportion of high-stakes predictions with a stored explanation) and user comprehension testing results are tracked |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
- User-tested explanation quality metrics are tracked; teams iterate on explanation design based on user comprehension and review efficiency data |
| Process & Governance |
- Explanation quality thresholds gate deployment for high-risk use cases; explanation fidelity and comprehensibility are assessed at model release |
| Technology & Tools |
- Explainability methods are selected based on their fidelity to the model's actual reasoning process, not just their interpretability to users |
| Measurement & Metrics |
- Explanation fidelity, user comprehension rate, reviewer decision efficiency improvement, and challenge rate from affected parties are measured per use case |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
- Explainability design is treated as a user experience discipline; affected community feedback informs explanation format and vocabulary |
| Process & Governance |
- Explainability standards are continuously updated based on regulatory developments, user research, and advances in interpretable AI |
| Technology & Tools |
- Inherently interpretable model architectures are preferred for high-risk use cases where performance allows; post-hoc methods are reserved for cases where complex models are necessary |
| Measurement & Metrics |
- Long-term tracking of challenge and appeal rates informs the adequacy of explanation provision; reduction in successful challenges indicates improving explanation quality |
Key Measures
- Percentage of high-stakes AI decisions with an associated stored explanation meeting the defined standard for that risk tier
- User comprehension rate for AI explanations measured through usability testing (proportion of users who can correctly interpret the explanation)
- Reviewer decision time improvement when AI explanations are provided versus withheld (efficiency metric)
- Number of successful regulatory or legal challenges where inadequate explanation was cited as a factor
- Explanation fidelity score (degree to which the explanation accurately represents the model's reasoning) per model release