Standard : Service Level Objectives (SLOs) guide delivery pace and risk tolerance

Purpose and Strategic Importance

This standard ensures that teams use Service Level Objectives (SLOs) to balance delivery velocity with system reliability and user experience. SLOs provide a measurable way to define what ‘good enough’ means for availability, latency, and other performance criteria, enabling teams to manage change safely and make informed trade-offs between speed and stability.

By anchoring delivery decisions in SLO thresholds and error budgets, this standard supports the policy “Prioritise Safety Before Productivity” by embedding resilience into planning and fostering responsible innovation. Without this standard, teams risk over-optimising for speed while degrading service quality and user trust.

Strategic Impact

Makes reliability and user experience an explicit factor in release planning
Prevents over-deployment when systems are under stress
Creates shared accountability for service quality across engineering teams
Enables smarter prioritisation of tech debt, reliability work, and performance improvement
Helps communicate trade-offs between new features and system health

Risks of Not Having This Standard

Systems become brittle due to uncontrolled release frequency
Quality regressions accumulate, leading to long-term reliability decay
Lack of guardrails around change leads to risk blindness
User trust erodes as service health fluctuates unpredictably
Engineering effort is misaligned with what users actually value

CMMI Maturity Model

Level 1 – Initial

Category	Description
People & Culture	- Teams have no clear expectations for service health. - Reliability is reactive and unmeasured.
Process & Governance	- No formal goals exist for system performance or error tolerance.
Technology & Tools	- Observability tools are used for basic uptime checks, but no thresholds or budgets exist.
Measurement & Metrics	- Incidents are tracked, but no link to defined objectives or limits.

Level 2 – Managed

Category	Description
People & Culture	- Teams begin to discuss service health as part of delivery planning.
Process & Governance	- SLOs are defined for a few critical services.
Technology & Tools	- Dashboards and alerting thresholds start to align with basic SLO definitions.
Measurement & Metrics	- Uptime and latency metrics are tracked but rarely used in delivery decisions.

Level 3 – Defined

Category	Description
People & Culture	- Teams use SLOs and error budgets to inform trade-offs during planning and retrospectives.
Process & Governance	- SLO reviews are built into delivery rituals such as sprint planning and incident reviews.
Technology & Tools	- Monitoring tools calculate and visualise SLOs, SLIs, and error budgets in real time.
Measurement & Metrics	- Breaches of SLOs are tracked and linked to delivery decisions and system improvements.

Level 4 – Quantitatively Managed

Category	Description
People & Culture	- All engineers are familiar with their team’s SLOs and the cost of exceeding error budgets.
Process & Governance	- SLOs are defined collaboratively with stakeholders and evolve with system maturity.
Technology & Tools	- SLO tooling integrates with CI/CD systems to gate releases or slow deployment when error budgets are consumed.
Measurement & Metrics	- Trends in error budgets, SLO compliance, and impact on business KPIs are analysed and used for decision-making.

Level 5 – Optimising

Category	Description
People & Culture	- Teams proactively tune delivery cadences based on live error budget usage and risk appetite.
Process & Governance	- SLO data informs prioritisation of performance improvements, incident recovery, and customer-facing investment.
Technology & Tools	- Predictive analytics forecast potential SLO breaches and suggest pre-emptive actions.
Measurement & Metrics	- Error budgets, latency percentiles, and user satisfaction metrics are all tied to business value delivery and engineering effectiveness.

Key Measures

Percentage of services with clearly defined and actively monitored SLOs
Frequency and duration of SLO breaches
Change failure rate during periods of low error budget availability
Number of deployments deferred or gated due to error budget thresholds
Percentage of delivery decisions informed by SLO data
Improvements made as a direct result of SLO breach analysis