Standard : Major incidents are followed by timely, blameless reviews
Purpose and Strategic Importance
This standard ensures major incidents are followed by timely, blameless reviews that focus on learning, not fault. It helps teams uncover root causes, share insights, and strengthen systems without fear or blame.
Aligned to our "Post-Incident Learning Culture" policy, this standard builds trust, encourages transparency, and improves system resilience. Without it, teams miss critical learning opportunities and risk repeating avoidable failures.
Strategic Impact
Clearly defined impacts of meeting this standard include improved delivery flow, reduced risk, higher system resilience, and better alignment to business needs. Over time, teams will see reduced rework, faster time to value, and stronger system integrity.
Risks of Not Having This Standard
- Reduced ability to respond to change or failure
- Accumulation of technical debt or friction
- Poor developer experience and morale
- Decreased confidence in releases and features
- Misalignment between technical implementation and business priorities
CMMI Maturity Model
Level 1 – Initial
People & Culture
- No shared mindset or training around reviews.
- Incident post‑mortems are seen as punitive or simply skipped.
Process & Governance
- No formal trigger or timeline for reviews.
- Each team “does its own thing” (if anything).
Technology & Tools
- No dedicated tracking or collaboration platform; often scribbled notes.
Measurement & Metrics
- Zero visibility: nobody measures review completion or outcomes.
Level 2 – Managed
People & Culture
- A handful of trained facilitators run blameless retrospectives.
- Teams recognise the value of learning, but it’s still “nice‑to‑have.”
Process & Governance
- Standard policy: any Severity 1/2 incident must be reviewed within 72 hours.
- A basic template (agenda + actions) is adopted by some teams.
Technology & Tools
- Incident register or ticketing system flags major incidents for review.
- Simple shared doc (e.g. Confluence page) captures write‑ups.
Measurement & Metrics
- % of incidents reviewed on time.
- Count of action items generated per review.
Level 3 – Defined
People & Culture
- Everyone (not just dev‑ops) attends “lessons‑learned” training.
- Peer reviewers audit write‑ups for blameless language.
Process & Governance
- A global playbook guides tailoring: teams adapt but preserve core steps.
- Reviews feed into an organisation‑wide knowledge base.
Technology & Tools
- Automated reminders and dashboards surface overdue reviews.
- Central repository with tagging, search and reuse of past learnings.
Measurement & Metrics
- Quality score (rubric‑based) on each review report.
- Median time from incident to published report.
Level 4 – Quantitatively Managed
People & Culture
- Data‑driven retrospectives: teams use control‑charts to spot trends.
- Roles include “Data Champion” to track review health.
Process & Governance
- KPIs (timeliness, closure rate, recurrence rate) have SLAs and owners.
- Quarterly “health checks” adjust the process when metrics dip.
Technology & Tools
- Real‑time analytics platform surfaces recurring error modes.
- Automated playbook suggestions based on past root‑cause patterns.
Measurement & Metrics
- Control‑chart analysis on review cycle times.
- % of actions closed within target; drop in repeat incidents.
Level 5 – Optimising
People & Culture
- Continuous‑learning champions drive cross‑team innovation.
- Successes celebrated publicly; lessons inform strategic roadmaps.
Process & Governance
- Predictive triggers (e.g. anomaly alerts) kick off proactive reviews.
- Review outcomes feed directly into training curricula and design standards.
Technology & Tools
- Machine‑learning‑driven RCA assistants suggest hypotheses in real time.
- Integration into planning tools so learnings automatically shape future work.
Measurement & Metrics
- Year‑on‑year % reduction in Sev 1/2 incidents.
- Business‑impact avoided (e.g. cost savings, uptime gain) quantified.
Key Measures
- Adoption metrics relevant to the standard (to be defined)
- Quality, throughput, and system health metrics aligned to capability
- Maturity scores based on structured assessment