Ragan McGill

Policy : Measure What AI Delivers, Not Just What It Predicts

Commitment to Outcome Measurement in AI Model accuracy is not a business outcome. A classifier that achieves 95% accuracy has not delivered value — it has demonstrated technical capability. Whether that capability translates into business value depends on whether the predictions are acted upon, whether acting on them produces better outcomes than the alternative, and whether those outcomes are visible to the organisation. Our commitment is to build measurement practices that track what AI systems actually deliver in the world, not just how well they perform on evaluation benchmarks.

What This Means Measuring AI delivery means instrumenting the full causal chain from model output to business outcome. It means tracking whether AI recommendations are followed, whether following them produces better results than not following them, and whether the aggregate effect of the AI system is improving the metric it was deployed to improve. It means reporting AI performance in business terms — cost savings, time reduction, error rate improvement, customer satisfaction — not in model terms alone.

Our commitment to measuring what AI delivers is built on:

Business Metric Ownership – Every deployed AI system has a clearly owned business metric that it is expected to move. The team responsible for the AI system is accountable for tracking and reporting on that metric, not just on model performance.
End-to-End Measurement Chains – We instrument the full chain from AI output to business outcome. This includes adoption rate (are people using the AI output?), action rate (are they acting on it?), outcome rate (does acting on it produce better results?), and net business impact.
Separation of Model Metrics and Business Metrics – Model performance metrics (accuracy, precision, recall, AUC) and business outcome metrics are tracked and reported separately. Model metrics inform engineering decisions; business metrics inform investment decisions.
Experiment-Based Outcome Validation – Where feasible, we use controlled experiments — A/B tests, holdout groups, or staged rollouts — to establish causal attribution between AI deployment and outcome change. Observational measurement alone is supplemented with experimental validation for high-stakes decisions.
Underperformance Visibility – When AI systems are not delivering the expected business outcomes, that underperformance is visible and escalated — not masked by favourable model metrics. A model that is technically excellent but strategically inert is treated as an underperforming asset.
Stakeholder Reporting in Business Terms – AI performance reports to business stakeholders are written in business terms: what changed, by how much, what it is worth, and what the next improvement target is. Model performance metrics are provided as supporting evidence, not as the headline.
Value Attribution Over Time – We track the cumulative value delivered by AI systems over time, not just at initial deployment. This enables ongoing investment decisions to be made on the basis of actual value history rather than projected value from the original business case.

Why This Matters Model metrics and business outcomes can diverge dramatically. An AI system can achieve excellent accuracy metrics while delivering negligible business value — because the predictions are too slow to be actionable, because users do not trust or follow the recommendations, because the problem it solves was not the bottleneck, or because the system creates as many problems as it solves in adjacent parts of the process. The only way to know whether AI is delivering value is to measure the value directly — in the terms that actually matter to the organisation.

Our Expectation Every deployed AI system reports on its business outcome metrics at the same cadence it reports on model performance metrics. Teams that can only report accuracy metrics without a corresponding view of business impact are not yet measuring what matters. Measuring what AI delivers — not just what it predicts — is how we ensure AI investment converts to genuine, demonstrable Value.