• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : AI systems deliver measurable improvement over non-AI alternatives

Purpose and Strategic Importance

This standard requires that every AI system in production must demonstrate a measurable improvement over the best available non-AI alternative — whether that is a rules-based system, a manual process, or a statistical heuristic. It supports the policy of measuring what AI delivers, not just what it predicts, by anchoring AI value in a concrete comparative context. The question is never whether the AI works in isolation; it is whether the AI is the best use of investment for the problem at hand.

Strategic Impact

  • Ensures that AI investment is directed only at problems where it genuinely outperforms alternatives, maximising portfolio ROI
  • Creates a clear and defensible value narrative for business stakeholders who fund AI initiatives
  • Prevents the displacement of effective non-AI solutions with AI systems that are more complex but no more effective
  • Provides the evidence base for scaling successful AI systems and retiring those that do not justify their operational cost
  • Encourages a problem-first mindset that considers the full solution landscape rather than defaulting to AI

Risks of Not Having This Standard

  • AI systems replace simpler, cheaper, more explainable alternatives without delivering superior outcomes
  • Business stakeholders fund increasingly complex AI portfolios that generate activity rather than value
  • Non-AI solutions with high reliability are retired in favour of AI systems that require more oversight and maintenance
  • The organisation develops a blind spot for the opportunity cost of AI complexity relative to simpler alternatives
  • Trust in AI deteriorates when users recognise that existing methods were actually better for their needs

CMMI Maturity Model

Level 1 – Initial

Category Description
People & Culture - Improvement over alternatives is assumed rather than measured; AI is pursued because it is perceived as modern
Process & Governance - No requirement to compare AI to non-AI alternatives before deployment; the comparison is not part of the project lifecycle
Technology & Tools - No tooling enables controlled comparison between AI and non-AI approaches in a live environment
Measurement & Metrics - No comparative performance data exists; the AI system's relative value over alternatives is unquantified

Level 2 – Managed

Category Description
People & Culture - Teams informally describe the non-AI alternative in project documentation; the case for AI is articulated qualitatively
Process & Governance - Business cases include a description of the non-AI baseline; comparative rationale is reviewed at project approval
Technology & Tools - Teams run the non-AI baseline in parallel with the AI system during pilot to generate comparison data
Measurement & Metrics - A summary comparison of AI versus non-AI performance on key metrics is included in pilot review reports

Level 3 – Defined

Category Description
People & Culture - Comparative evaluation against non-AI alternatives is a required phase of the AI project lifecycle
Process & Governance - A defined evaluation methodology specifies how AI and non-AI alternatives will be compared on performance, cost, and user experience
Technology & Tools - A/B testing or shadow mode deployment infrastructure enables live comparison between AI and non-AI approaches
Measurement & Metrics - Improvement over non-AI baseline is calculated on a defined set of metrics; a minimum improvement threshold gates production deployment

Level 4 – Quantitatively Managed

Category Description
People & Culture - Teams are held accountable for sustaining the demonstrated improvement over the non-AI baseline throughout the production lifecycle
Process & Governance - Ongoing monitoring tracks whether the AI system continues to outperform the non-AI alternative as the operating environment changes
Technology & Tools - Shadow comparison systems maintain a live non-AI baseline in production for continuous relative performance tracking
Measurement & Metrics - Comparative advantage metrics are reviewed quarterly; systems that lose their advantage trigger a review of continued investment

Level 5 – Optimising

Category Description
People & Culture - Teams regularly revisit whether AI remains the best solution as non-AI approaches improve and problem contexts evolve
Process & Governance - Comparative evaluation standards are updated as the landscape of non-AI alternatives changes, including improvements in automation and process redesign
Technology & Tools - Automated comparative evaluation continuously tests AI against evolving rule-based and statistical alternatives
Measurement & Metrics - Organisation-wide data on AI versus non-AI performance feeds strategic decisions about where to invest in AI versus process improvement

Key Measures

  • Percentage of deployed AI systems with a documented comparative evaluation against a non-AI baseline
  • Average measured improvement of AI systems over their non-AI alternatives across the production portfolio
  • Number of AI systems retired or replaced with simpler alternatives following comparative evaluation
  • Rate of production AI systems that have maintained their performance advantage over the non-AI baseline for more than 12 months
  • Cost-adjusted comparative performance ratio (value delivered per pound of operating cost, AI versus non-AI)
Associated Policies
Associated Practices
  • Business Impact Measurement
  • AI Use Case Discovery
  • Value Hypothesis Testing
  • Human Baseline Benchmarking

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering