Standard : AI models are versioned and reproducible across environments

Purpose and Strategic Importance

Model versioning and reproducibility ensure that every AI artefact — including training data snapshots, hyperparameters, code, and environment configurations — can be reconstructed exactly. This standard supports the policy of building AI systems that learn and improve continuously by providing a stable foundation from which improvements can be measured and regression can be identified. Without it, teams cannot confidently iterate on models because they have no reliable baseline to compare against.

Strategic Impact

Enables reliable A/B comparison between model versions, making improvement measurable and defensible
Reduces time to diagnose and roll back production incidents caused by model changes
Supports audit and regulatory requirements by providing a traceable history of model evolution
Accelerates onboarding by allowing new team members to reproduce prior work without tribal knowledge
Creates the conditions for safe, frequent model updates by removing ambiguity about what changed

Risks of Not Having This Standard

Silent regressions go undetected because there is no authoritative prior version to compare against
Production incidents are slow to resolve when the deployed model cannot be reconstructed from source
Regulatory audits fail when model provenance cannot be demonstrated
Teams duplicate work rebuilding environments that should have been reproducible from a manifest
Continuous improvement stalls because teams cannot confidently attribute performance changes to specific interventions

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Model files are saved informally by individuals with no shared naming convention or central registry
Process & Governance	- No versioning policy exists; model updates overwrite previous artefacts without a change record
Technology & Tools	- Models stored as ad hoc files on shared drives or local machines with no lineage tracking
Measurement & Metrics	- No metrics captured; team is unaware of which model version is running in production at any given time

Level 2 – Managed

Category	Description
People & Culture	- Teams agree on a basic naming convention for model files and maintain a simple changelog
Process & Governance	- A lightweight versioning policy is documented; model updates require a dated release note
Technology & Tools	- Models are stored in a shared repository with sequential version tags; environment dependencies captured in a requirements file
Measurement & Metrics	- Teams track which model version is deployed per environment; rollback steps are documented

Level 3 – Defined

Category	Description
People & Culture	- All team members understand and follow the versioning standard; versioning is part of the definition of done for model work
Process & Governance	- Model versioning is integrated into the MLOps pipeline; every model artefact is tagged with training data version, code commit, and environment hash
Technology & Tools	- An ML experiment tracking tool (e.g. MLflow, DVC) is in use; models are stored in a model registry with full provenance metadata
Measurement & Metrics	- Reproducibility is validated at release; teams measure the delta between environments to confirm consistency

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- Reproducibility is a shared quality gate; teams block releases that fail environment parity checks
Process & Governance	- SLAs exist for how quickly any prior model version can be reproduced; compliance is measured per release
Technology & Tools	- Full model lineage is captured automatically including dataset fingerprints, dependency hashes, and container images; cross-environment diff tooling is in place
Measurement & Metrics	- Reproduction success rate tracked per release; mean time to reproduce a historical model is measured and reported quarterly

Level 5 – Optimising

Category	Description
People & Culture	- Teams proactively identify and eliminate sources of non-determinism; reproducibility findings are shared across the organisation
Process & Governance	- Versioning and reproducibility requirements are continuously refined based on incident retrospectives and regulatory feedback
Technology & Tools	- End-to-end reproducibility is automated and verified in CI; tooling flags environmental drift before it reaches production
Measurement & Metrics	- Reproducibility metrics are used to forecast model stability and inform release confidence scoring

Key Measures

Percentage of model releases with full provenance metadata (training data version, code commit, environment hash)
Mean time to reproduce a historical model version from the registry
Number of production incidents attributed to unreproducible model state per quarter
Percentage of environments where model outputs are confirmed consistent across a defined test suite
Rate of rollback success when reverting to a prior model version