Standard : Service Level Objectives (SLOs) guide delivery pace and risk tolerance
Purpose and Strategic Importance
This standard ensures that teams use Service Level Objectives (SLOs) to balance delivery velocity with system reliability and user experience. SLOs provide a measurable way to define what ‘good enough’ means for availability, latency, and other performance criteria, enabling teams to manage change safely and make informed trade-offs between speed and stability.
By anchoring delivery decisions in SLO thresholds and error budgets, this standard supports the policy “Prioritise Safety Before Productivity” by embedding resilience into planning and fostering responsible innovation. Without this standard, teams risk over-optimising for speed while degrading service quality and user trust.
Strategic Impact
- Makes reliability and user experience an explicit factor in release planning
- Prevents over-deployment when systems are under stress
- Creates shared accountability for service quality across engineering teams
- Enables smarter prioritisation of tech debt, reliability work, and performance improvement
- Helps communicate trade-offs between new features and system health
Risks of Not Having This Standard
- Systems become brittle due to uncontrolled release frequency
- Quality regressions accumulate, leading to long-term reliability decay
- Lack of guardrails around change leads to risk blindness
- User trust erodes as service health fluctuates unpredictably
- Engineering effort is misaligned with what users actually value
CMMI Maturity Model
Level 1 – Initial
| Category |
Description |
| People & Culture |
- Teams have no clear expectations for service health. - Reliability is reactive and unmeasured. |
| Process & Governance |
- No formal goals exist for system performance or error tolerance. |
| Technology & Tools |
- Observability tools are used for basic uptime checks, but no thresholds or budgets exist. |
| Measurement & Metrics |
- Incidents are tracked, but no link to defined objectives or limits. |
Level 2 – Managed
| Category |
Description |
| People & Culture |
- Teams begin to discuss service health as part of delivery planning. |
| Process & Governance |
- SLOs are defined for a few critical services. |
| Technology & Tools |
- Dashboards and alerting thresholds start to align with basic SLO definitions. |
| Measurement & Metrics |
- Uptime and latency metrics are tracked but rarely used in delivery decisions. |
Level 3 – Defined
| Category |
Description |
| People & Culture |
- Teams use SLOs and error budgets to inform trade-offs during planning and retrospectives. |
| Process & Governance |
- SLO reviews are built into delivery rituals such as sprint planning and incident reviews. |
| Technology & Tools |
- Monitoring tools calculate and visualise SLOs, SLIs, and error budgets in real time. |
| Measurement & Metrics |
- Breaches of SLOs are tracked and linked to delivery decisions and system improvements. |
Level 4 – Quantitatively Managed
| Category |
Description |
| People & Culture |
- All engineers are familiar with their team’s SLOs and the cost of exceeding error budgets. |
| Process & Governance |
- SLOs are defined collaboratively with stakeholders and evolve with system maturity. |
| Technology & Tools |
- SLO tooling integrates with CI/CD systems to gate releases or slow deployment when error budgets are consumed. |
| Measurement & Metrics |
- Trends in error budgets, SLO compliance, and impact on business KPIs are analysed and used for decision-making. |
Level 5 – Optimising
| Category |
Description |
| People & Culture |
- Teams proactively tune delivery cadences based on live error budget usage and risk appetite. |
| Process & Governance |
- SLO data informs prioritisation of performance improvements, incident recovery, and customer-facing investment. |
| Technology & Tools |
- Predictive analytics forecast potential SLO breaches and suggest pre-emptive actions. |
| Measurement & Metrics |
- Error budgets, latency percentiles, and user satisfaction metrics are all tied to business value delivery and engineering effectiveness. |
Key Measures
- Percentage of services with clearly defined and actively monitored SLOs
- Frequency and duration of SLO breaches
- Change failure rate during periods of low error budget availability
- Number of deployments deferred or gated due to error budget thresholds
- Percentage of delivery decisions informed by SLO data
- Improvements made as a direct result of SLO breach analysis