Standard : Major incidents are followed by timely, blameless reviews

Purpose and Strategic Importance

This standard ensures major incidents are followed by timely, blameless reviews that focus on learning, not fault. It helps teams uncover root causes, share insights, and strengthen systems without fear or blame.

Aligned to our "Post-Incident Learning Culture" policy, this standard builds trust, encourages transparency, and improves system resilience. Without it, teams miss critical learning opportunities and risk repeating avoidable failures.

Strategic Impact

Clearly defined impacts of meeting this standard include improved delivery flow, reduced risk, higher system resilience, and better alignment to business needs. Over time, teams will see reduced rework, faster time to value, and stronger system integrity.

Risks of Not Having This Standard

Reduced ability to respond to change or failure
Accumulation of technical debt or friction
Poor developer experience and morale
Decreased confidence in releases and features
Misalignment between technical implementation and business priorities

CMMI Maturity Model

Level 1 – Initial

People & Culture
- No shared mindset or training around reviews.
- Incident post‑mortems are seen as punitive or simply skipped.
Process & Governance
- No formal trigger or timeline for reviews.
- Each team “does its own thing” (if anything).
Technology & Tools
- No dedicated tracking or collaboration platform; often scribbled notes.
Measurement & Metrics
- Zero visibility: nobody measures review completion or outcomes.

Level 2 – Managed

People & Culture
- A handful of trained facilitators run blameless retrospectives.
- Teams recognise the value of learning, but it’s still “nice‑to‑have.”
Process & Governance
- Standard policy: any Severity 1/2 incident must be reviewed within 72 hours.
- A basic template (agenda + actions) is adopted by some teams.
Technology & Tools
- Incident register or ticketing system flags major incidents for review.
- Simple shared doc (e.g. Confluence page) captures write‑ups.
Measurement & Metrics
- % of incidents reviewed on time.
- Count of action items generated per review.

Level 3 – Defined

People & Culture
- Everyone (not just dev‑ops) attends “lessons‑learned” training.
- Peer reviewers audit write‑ups for blameless language.
Process & Governance
- A global playbook guides tailoring: teams adapt but preserve core steps.
- Reviews feed into an organisation‑wide knowledge base.
Technology & Tools
- Automated reminders and dashboards surface overdue reviews.
- Central repository with tagging, search and reuse of past learnings.
Measurement & Metrics
- Quality score (rubric‑based) on each review report.
- Median time from incident to published report.

Level 4 – Quantitatively Managed

People & Culture
- Data‑driven retrospectives: teams use control‑charts to spot trends.
- Roles include “Data Champion” to track review health.
Process & Governance
- KPIs (timeliness, closure rate, recurrence rate) have SLAs and owners.
- Quarterly “health checks” adjust the process when metrics dip.
Technology & Tools
- Real‑time analytics platform surfaces recurring error modes.
- Automated playbook suggestions based on past root‑cause patterns.
Measurement & Metrics
- Control‑chart analysis on review cycle times.
- % of actions closed within target; drop in repeat incidents.

Level 5 – Optimising

People & Culture
- Continuous‑learning champions drive cross‑team innovation.
- Successes celebrated publicly; lessons inform strategic roadmaps.
Process & Governance
- Predictive triggers (e.g. anomaly alerts) kick off proactive reviews.
- Review outcomes feed directly into training curricula and design standards.
Technology & Tools
- Machine‑learning‑driven RCA assistants suggest hypotheses in real time.
- Integration into planning tools so learnings automatically shape future work.
Measurement & Metrics
- Year‑on‑year % reduction in Sev 1/2 incidents.
- Business‑impact avoided (e.g. cost savings, uptime gain) quantified.

Key Measures

Adoption metrics relevant to the standard (to be defined)
Quality, throughput, and system health metrics aligned to capability
Maturity scores based on structured assessment