Standard : AI models are versioned and reproducible across environments
Purpose and Strategic Importance
Model versioning and reproducibility ensure that every AI artefact — including training data snapshots, hyperparameters, code, and environment configurations — can be reconstructed exactly. This standard supports the policy of building AI systems that learn and improve continuously by providing a stable foundation from which improvements can be measured and regression can be identified. Without it, teams cannot confidently iterate on models because they have no reliable baseline to compare against.
Strategic Impact
- Enables reliable A/B comparison between model versions, making improvement measurable and defensible
- Reduces time to diagnose and roll back production incidents caused by model changes
- Supports audit and regulatory requirements by providing a traceable history of model evolution
- Accelerates onboarding by allowing new team members to reproduce prior work without tribal knowledge
- Creates the conditions for safe, frequent model updates by removing ambiguity about what changed
Risks of Not Having This Standard
- Silent regressions go undetected because there is no authoritative prior version to compare against
- Production incidents are slow to resolve when the deployed model cannot be reconstructed from source
- Regulatory audits fail when model provenance cannot be demonstrated
- Teams duplicate work rebuilding environments that should have been reproducible from a manifest
- Continuous improvement stalls because teams cannot confidently attribute performance changes to specific interventions
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
- Model files are saved informally by individuals with no shared naming convention or central registry |
| Process & Governance |
- No versioning policy exists; model updates overwrite previous artefacts without a change record |
| Technology & Tools |
- Models stored as ad hoc files on shared drives or local machines with no lineage tracking |
| Measurement & Metrics |
- No metrics captured; team is unaware of which model version is running in production at any given time |
Level 2 – Managed
| Category |
Description |
| People & Culture |
- Teams agree on a basic naming convention for model files and maintain a simple changelog |
| Process & Governance |
- A lightweight versioning policy is documented; model updates require a dated release note |
| Technology & Tools |
- Models are stored in a shared repository with sequential version tags; environment dependencies captured in a requirements file |
| Measurement & Metrics |
- Teams track which model version is deployed per environment; rollback steps are documented |
Level 3 – Defined
| Category |
Description |
| People & Culture |
- All team members understand and follow the versioning standard; versioning is part of the definition of done for model work |
| Process & Governance |
- Model versioning is integrated into the MLOps pipeline; every model artefact is tagged with training data version, code commit, and environment hash |
| Technology & Tools |
- An ML experiment tracking tool (e.g. MLflow, DVC) is in use; models are stored in a model registry with full provenance metadata |
| Measurement & Metrics |
- Reproducibility is validated at release; teams measure the delta between environments to confirm consistency |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
- Reproducibility is a shared quality gate; teams block releases that fail environment parity checks |
| Process & Governance |
- SLAs exist for how quickly any prior model version can be reproduced; compliance is measured per release |
| Technology & Tools |
- Full model lineage is captured automatically including dataset fingerprints, dependency hashes, and container images; cross-environment diff tooling is in place |
| Measurement & Metrics |
- Reproduction success rate tracked per release; mean time to reproduce a historical model is measured and reported quarterly |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
- Teams proactively identify and eliminate sources of non-determinism; reproducibility findings are shared across the organisation |
| Process & Governance |
- Versioning and reproducibility requirements are continuously refined based on incident retrospectives and regulatory feedback |
| Technology & Tools |
- End-to-end reproducibility is automated and verified in CI; tooling flags environmental drift before it reaches production |
| Measurement & Metrics |
- Reproducibility metrics are used to forecast model stability and inform release confidence scoring |
Key Measures
- Percentage of model releases with full provenance metadata (training data version, code commit, environment hash)
- Mean time to reproduce a historical model version from the registry
- Number of production incidents attributed to unreproducible model state per quarter
- Percentage of environments where model outputs are confirmed consistent across a defined test suite
- Rate of rollback success when reverting to a prior model version