Standard : AI systems are tested with adversarial and edge case inputs
Purpose and Strategic Importance
This standard requires that AI systems be deliberately probed with inputs designed to expose failure modes — including adversarial examples crafted to fool the model, edge cases at the boundary of the training distribution, and real-world noise scenarios. It supports the policy of rigorous pre-deployment evaluation by ensuring that models are tested not just on clean representative data but on the messy, ambiguous, and intentionally malicious inputs they will encounter in production. Passing standard accuracy benchmarks is necessary but not sufficient for safe deployment.
Strategic Impact
- Surfaces failure modes before they reach users, reducing the cost and reputational damage of production incidents
- Strengthens model robustness by creating a feedback loop between adversarial findings and model improvement
- Builds trust with risk and compliance stakeholders by demonstrating that the team has actively tried to break the system
- Reduces the attack surface for adversarial manipulation in customer-facing and security-sensitive AI applications
- Creates reusable adversarial test suites that accumulate value across model iterations
Risks of Not Having This Standard
- Production systems fail unexpectedly on inputs that were predictable and testable in advance
- Malicious actors exploit known model weaknesses that the team never investigated
- Edge case failures in high-stakes domains (credit, healthcare, safety systems) cause disproportionate harm
- Model confidence scores are misleading because they have never been calibrated against out-of-distribution inputs
- Teams are unable to distinguish robust models from brittle ones because their evaluation methodology is too narrow
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
- Testing is limited to happy-path scenarios on clean data; adversarial thinking is absent from engineering practice |
| Process & Governance |
- No requirement to test edge cases or adversarial inputs; evaluation ends when standard metrics pass |
| Technology & Tools |
- Test sets are drawn only from the same distribution as training data; no dedicated adversarial tooling |
| Measurement & Metrics |
- Only standard accuracy or loss metrics are reported; robustness is not measured |
Level 2 – Managed
| Category |
Description |
| People & Culture |
- Teams begin to document known failure modes and edge cases discovered in production for inclusion in future test sets |
| Process & Governance |
- A requirement to include edge case tests is added to the release checklist; adversarial coverage is informal but expected |
| Technology & Tools |
- Teams manually construct edge case examples based on domain knowledge; test sets are versioned alongside models |
| Measurement & Metrics |
- Edge case pass rates are tracked separately from standard benchmark results |
Level 3 – Defined
| Category |
Description |
| People & Culture |
- Adversarial testing is a standard engineering discipline; team members are trained in perturbation techniques and red-teaming approaches |
| Process & Governance |
- A formal adversarial testing protocol is defined covering input perturbation, boundary cases, out-of-distribution samples, and prompt injection for generative models |
| Technology & Tools |
- Automated adversarial test suites run in the evaluation pipeline; tools such as Foolbox, TextAttack, or domain-specific fuzzers are integrated |
| Measurement & Metrics |
- Adversarial robustness scores are reported alongside standard metrics; minimum robustness thresholds gate production deployment |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
- Teams run structured red-team exercises before major releases; findings feed directly into model improvement backlogs |
| Process & Governance |
- Adversarial coverage requirements are quantified per risk tier; high-risk applications require broader adversarial test suites |
| Technology & Tools |
- Continuous adversarial testing is integrated into the MLOps pipeline; new adversarial patterns from production incidents are automatically added to test libraries |
| Measurement & Metrics |
- Attack success rate, robustness degradation under perturbation, and edge case failure rate are tracked over model generations |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
- Adversarial findings are shared across teams and contribute to an organisational knowledge base of AI failure patterns |
| Process & Governance |
- Testing protocols are continuously updated based on emerging adversarial research and incident learnings |
| Technology & Tools |
- AI-assisted adversarial generation tools create novel test cases at scale; human red-teamers review and curate findings |
| Measurement & Metrics |
- Robustness trend data informs model architecture decisions and training data augmentation strategies |
Key Measures
- Percentage of model releases with a documented adversarial and edge case test suite
- Adversarial robustness score (percentage of adversarial inputs correctly handled) per model release
- Number of new adversarial test cases added per release cycle from production incident learnings
- Rate of edge case failures discovered in production that were absent from pre-release adversarial test sets
- Mean time to incorporate a newly discovered adversarial pattern into the standard test suite