Standard : Infrastructure is scalable, automated, and adaptable to changing demands

Purpose and Strategic Importance

Infrastructure that is manually provisioned, inconsistently configured, and unable to scale without human intervention is a direct constraint on delivery speed and system reliability. This standard establishes that all infrastructure must be defined as code, provisioned on-demand, and capable of scaling horizontally to meet changing load — without manual intervention. Infrastructure as Code (IaC) using tools such as Terraform or Pulumi ensures that environments are reproducible, version-controlled, and auditable. It eliminates environment drift, reduces the risk of configuration-related incidents, and enables teams to spin up and tear down environments in minutes rather than days.

Aligned to our "Engineering Excellence First" policy, this standard recognises infrastructure as a first-class engineering concern, not an operational afterthought. Cloud-native patterns including auto-scaling, containerisation, and ephemeral environments allow teams to match resource allocation to actual demand, reducing cost while improving resilience. Environment parity — ensuring that development, staging, and production environments are structurally identical — removes the "it works on my machine" class of incident and accelerates the path from code to confident deployment. Infrastructure that cannot keep pace with delivery ambitions will always constrain the speed of the teams that depend on it.

Strategic Impact

Eliminates manual infrastructure provisioning as a bottleneck to team delivery and environment availability
Ensures environment consistency through code-defined configuration, eliminating drift and configuration-related incidents
Enables rapid scaling to meet demand without requiring human intervention or emergency capacity requests
Reduces cloud spend waste through on-demand provisioning and auto-scaling aligned to actual usage patterns

Risks of Not Having This Standard

Manual infrastructure processes create long lead times for new environments, directly blocking delivery pipelines
Configuration drift between environments leads to incidents that are difficult to diagnose and reproduce
Systems unable to scale horizontally under load result in degraded performance, outages, and customer impact
Infrastructure knowledge becomes siloed in individuals, creating single points of failure and bus factor risk
Cost inefficiency grows as over-provisioned static infrastructure accumulates without automated lifecycle management

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	Infrastructure is managed by a small operations team using manual, tribal knowledge.
Process & Governance	Environment provisioning is ad hoc, undocumented, and relies on individual expertise.
Technology & Tools	Servers and environments are provisioned manually via console, scripts, or direct SSH access.
Measurement & Metrics	Infrastructure costs, availability, and provisioning lead times are not systematically tracked.

Level 2 – Managed

Category	Description
People & Culture	Operations and engineering teams begin collaborating on infrastructure provisioning needs.
Process & Governance	Some infrastructure is scripted, but processes remain largely manual and team-specific.
Technology & Tools	Basic cloud services are adopted but provisioned through the console rather than code.
Measurement & Metrics	Environment availability is tracked; provisioning lead times are measured informally.

Level 3 – Defined

Category	Description
People & Culture	Engineers treat infrastructure as code and contribute to IaC repositories alongside application code.
Process & Governance	All environments are provisioned via IaC tools (Terraform, Pulumi, or equivalent) with peer review.
Technology & Tools	Auto-scaling policies are configured for production workloads; container orchestration is adopted.
Measurement & Metrics	Provisioning lead time, scaling events, and environment parity compliance are actively measured.

Level 4 – Quantitatively Managed

Category	Description
People & Culture	Teams own their infrastructure end-to-end using self-service provisioning via platform tooling.
Process & Governance	Infrastructure changes follow the same review and automated testing processes as application code.
Technology & Tools	Ephemeral environments are created per branch or pull request and destroyed automatically on merge.
Measurement & Metrics	Scaling latency, provisioning time, and cost-per-environment are tracked with defined improvement targets.

Level 5 – Optimising

Category	Description
People & Culture	Infrastructure engineering is a core competency distributed across all product delivery teams.
Process & Governance	Infrastructure policies are enforced automatically via policy-as-code tools (e.g., OPA, Sentinel).
Technology & Tools	Intelligent auto-scaling uses predictive models to pre-warm capacity ahead of anticipated demand.
Measurement & Metrics	Infrastructure cost efficiency, elasticity, and resilience metrics continuously improve through feedback loops.

Key Measures

Environment provisioning lead time: time from request to fully available environment
Infrastructure drift incidents: number of incidents attributable to environment configuration inconsistency
Auto-scaling response time: time for infrastructure to scale in response to load changes
Percentage of infrastructure defined and managed as code versus manually provisioned
Mean time to provision a new environment from scratch using IaC pipelines
Infrastructure cost variance: deviation between provisioned capacity cost and actual demand cost