Standard : Model degradation triggers are defined and monitored in production

Purpose and Strategic Importance

This standard requires that for every AI model in production, specific quantitative degradation triggers are defined before deployment and actively monitored in live operation. When a trigger is breached, a defined response — alert, escalation, automatic rollback, or retraining initiation — must occur without relying on manual detection. It supports the policy of governing AI models throughout their lifecycle by treating post-deployment governance as an engineering concern, not a periodic management review. Without defined triggers, degradation is discovered through user complaints rather than instrumentation.

Strategic Impact

Enables proactive response to model degradation before it reaches the threshold of user impact or business harm
Creates a contractual quality agreement between the AI team and the business about what "acceptable model performance" means in production
Reduces mean time to detect and mean time to recover from AI performance incidents through automated alerting
Provides the quantitative evidence needed to justify retraining investment at the right time rather than on an arbitrary schedule
Supports lifecycle governance requirements in regulated industries where continuous model oversight is mandatory

Risks of Not Having This Standard

Model degradation compounds silently for months before discovery, maximising harm and recovery cost
Retraining decisions are made on gut feel or calendar schedules rather than evidence of actual degradation
Incident response is slow because the team must first establish baseline performance before investigating the extent of the problem
Business stakeholders lose confidence when they discover that the organisation's AI systems degrade without detection
Regulatory scrutiny increases when organisations cannot demonstrate that they have mechanisms to detect degrading model performance

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Model degradation is detected through user complaints or periodic manual spot-checks; there is no proactive detection
Process & Governance	- No degradation trigger policy; the team has no formal agreement about what level of performance decline constitutes a problem
Technology & Tools	- Production monitoring is limited to infrastructure metrics (latency, error rate); model quality metrics are absent
Measurement & Metrics	- No production model quality metrics; degradation cannot be quantified until it has caused visible harm

Level 2 – Managed

Category	Description
People & Culture	- Teams identify the key quality metrics for each production model and discuss threshold levels informally
Process & Governance	- A requirement to define at least one degradation trigger per production model is established; triggers are documented at deployment
Technology & Tools	- Basic metric dashboards display proxy metrics (prediction score distributions, volume anomalies) that can indicate degradation
Measurement & Metrics	- Trigger thresholds are defined per model; alerts are sent when thresholds are breached, though response procedures are informal

Level 3 – Defined

Category	Description
People & Culture	- Degradation trigger definition is part of the deployment readiness checklist; triggers are agreed between ML, product, and operations teams
Process & Governance	- A formal trigger definition standard specifies required metric types (data drift, prediction drift, ground truth performance) and response procedures per trigger type
Technology & Tools	- Model monitoring platforms track defined triggers automatically; automated alerts are routed to on-call channels with context to support rapid response
Measurement & Metrics	- Trigger breach rate, alert response time, and false positive rate are tracked per model and reviewed in operational reviews

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- Teams are accountable for trigger coverage and response SLAs; trigger effectiveness is reviewed quarterly using incident retrospective data
Process & Governance	- Trigger thresholds are calibrated quantitatively based on the cost of false positives (unnecessary retraining) and false negatives (undetected degradation)
Technology & Tools	- Multi-metric anomaly detection combines multiple signals to reduce false positive rates while maintaining sensitivity
Measurement & Metrics	- Mean time to detect degradation, mean time to recover, and trigger sensitivity and specificity are measured and reported

Level 5 – Optimising

Category	Description
People & Culture	- Trigger design knowledge is shared across teams; the organisation develops a library of effective trigger patterns per AI use case type
Process & Governance	- Trigger definitions are continuously refined based on incident retrospectives and advances in drift detection methodology
Technology & Tools	- Adaptive trigger systems adjust thresholds dynamically based on seasonal patterns and known environmental changes
Measurement & Metrics	- Long-term data on trigger effectiveness is used to build predictive models of when specific model types are likely to degrade, enabling proactive retraining

Key Measures

Percentage of production AI models with at least one formally defined and actively monitored degradation trigger
Mean time to detect a degradation event from the point at which it first exceeded the trigger threshold
Trigger false positive rate (alerts raised that did not correspond to genuine degradation requiring intervention)
Trigger false negative rate (degradation events that were not detected by triggers before user impact)
Mean time to recover from a triggered degradation event (retrain, recalibrate, or rollback)