• Home
  • BVSSH
  • C4E
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Standard : Infrastructure is scalable, automated, and adaptable to changing demands

Purpose and Strategic Importance

Infrastructure that is manually provisioned, inconsistently configured, and unable to scale without human intervention is a direct constraint on delivery speed and system reliability. This standard establishes that all infrastructure must be defined as code, provisioned on-demand, and capable of scaling horizontally to meet changing load — without manual intervention. Infrastructure as Code (IaC) using tools such as Terraform or Pulumi ensures that environments are reproducible, version-controlled, and auditable. It eliminates environment drift, reduces the risk of configuration-related incidents, and enables teams to spin up and tear down environments in minutes rather than days.

Aligned to our "Engineering Excellence First" policy, this standard recognises infrastructure as a first-class engineering concern, not an operational afterthought. Cloud-native patterns including auto-scaling, containerisation, and ephemeral environments allow teams to match resource allocation to actual demand, reducing cost while improving resilience. Environment parity — ensuring that development, staging, and production environments are structurally identical — removes the "it works on my machine" class of incident and accelerates the path from code to confident deployment. Infrastructure that cannot keep pace with delivery ambitions will always constrain the speed of the teams that depend on it.

Strategic Impact

  • Eliminates manual infrastructure provisioning as a bottleneck to team delivery and environment availability
  • Ensures environment consistency through code-defined configuration, eliminating drift and configuration-related incidents
  • Enables rapid scaling to meet demand without requiring human intervention or emergency capacity requests
  • Reduces cloud spend waste through on-demand provisioning and auto-scaling aligned to actual usage patterns

Risks of Not Having This Standard

  • Manual infrastructure processes create long lead times for new environments, directly blocking delivery pipelines
  • Configuration drift between environments leads to incidents that are difficult to diagnose and reproduce
  • Systems unable to scale horizontally under load result in degraded performance, outages, and customer impact
  • Infrastructure knowledge becomes siloed in individuals, creating single points of failure and bus factor risk
  • Cost inefficiency grows as over-provisioned static infrastructure accumulates without automated lifecycle management

CMMI Maturity Model

Level 1 – Initial

Category Description
People & Culture Infrastructure is managed by a small operations team using manual, tribal knowledge.
Process & Governance Environment provisioning is ad hoc, undocumented, and relies on individual expertise.
Technology & Tools Servers and environments are provisioned manually via console, scripts, or direct SSH access.
Measurement & Metrics Infrastructure costs, availability, and provisioning lead times are not systematically tracked.

Level 2 – Managed

Category Description
People & Culture Operations and engineering teams begin collaborating on infrastructure provisioning needs.
Process & Governance Some infrastructure is scripted, but processes remain largely manual and team-specific.
Technology & Tools Basic cloud services are adopted but provisioned through the console rather than code.
Measurement & Metrics Environment availability is tracked; provisioning lead times are measured informally.

Level 3 – Defined

Category Description
People & Culture Engineers treat infrastructure as code and contribute to IaC repositories alongside application code.
Process & Governance All environments are provisioned via IaC tools (Terraform, Pulumi, or equivalent) with peer review.
Technology & Tools Auto-scaling policies are configured for production workloads; container orchestration is adopted.
Measurement & Metrics Provisioning lead time, scaling events, and environment parity compliance are actively measured.

Level 4 – Quantitatively Managed

Category Description
People & Culture Teams own their infrastructure end-to-end using self-service provisioning via platform tooling.
Process & Governance Infrastructure changes follow the same review and automated testing processes as application code.
Technology & Tools Ephemeral environments are created per branch or pull request and destroyed automatically on merge.
Measurement & Metrics Scaling latency, provisioning time, and cost-per-environment are tracked with defined improvement targets.

Level 5 – Optimising

Category Description
People & Culture Infrastructure engineering is a core competency distributed across all product delivery teams.
Process & Governance Infrastructure policies are enforced automatically via policy-as-code tools (e.g., OPA, Sentinel).
Technology & Tools Intelligent auto-scaling uses predictive models to pre-warm capacity ahead of anticipated demand.
Measurement & Metrics Infrastructure cost efficiency, elasticity, and resilience metrics continuously improve through feedback loops.

Key Measures

  • Environment provisioning lead time: time from request to fully available environment
  • Infrastructure drift incidents: number of incidents attributable to environment configuration inconsistency
  • Auto-scaling response time: time for infrastructure to scale in response to load changes
  • Percentage of infrastructure defined and managed as code versus manually provisioned
  • Mean time to provision a new environment from scratch using IaC pipelines
  • Infrastructure cost variance: deviation between provisioned capacity cost and actual demand cost
Associated Policies
  • Engineering Excellence First

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering