Ragan McGill

Practice : Live Dashboards

Purpose and Strategic Importance

Live Dashboards provide real-time visualisations of system health, performance, and usage metrics. They make complex telemetry data accessible and actionable, allowing teams to spot trends, diagnose issues, and validate assumptions quickly.

By putting live data at the fingertips of developers, product owners, and operations, dashboards become a shared foundation for reliable systems, rapid iteration, and informed decision-making.

Description of the Practice

Dashboards pull data from observability platforms (e.g. Prometheus, Grafana, Datadog, CloudWatch, Kibana).
Metrics displayed may include latency, throughput, error rates, infrastructure usage, custom SLIs, and user behaviour indicators.
Used in team rituals (e.g. stand-ups, ops reviews) and during deployments, incidents, or experimentation.
Should be curated for clarity, with consistent design, thresholds, and filtering.

How to Practise It (Playbook)

1. Getting Started

Identify the most critical metrics that reflect system performance and user impact.
Build role-specific dashboards (e.g. frontend, backend, product) that focus on actionable signals.
Use consistent naming, units, and alert thresholds to aid interpretation.
Make dashboards visible - in war rooms, TV screens, browser tabs, or Slack channels.

2. Scaling and Maturing

Include business-level and technical metrics to bridge dev–ops–product understanding.
Set up annotations for deployments, incidents, or feature flags to provide context.
Regularly review and prune unused or confusing panels to reduce noise.
Encourage teams to create personal or temporary dashboards for experiments and investigations.
Link dashboards directly from alerts, incidents, and runbooks.

3. Team Behaviours to Encourage

Use dashboards proactively - not just during incidents.
Ask “what should we be seeing?” and “what are we missing?”
Create a culture where dashboard hygiene is a shared responsibility.
Share insights and anomalies openly - even when the system is behaving well.

4. Watch Out For…

Dashboards with too many panels or no clear story.
Metrics without context - e.g. alerting on spikes without understanding baselines.
Siloed dashboards that are only useful to one team or person.
Relying solely on visualisation without alerting or deeper analysis.

5. Signals of Success

Teams regularly use dashboards to support decisions and detect issues early.
Observability drives change - performance tuning, feature rollbacks, or architecture reviews.
Incidents are diagnosed faster due to shared visual understanding.
Dashboards evolve as systems grow - never stale, always useful.
Engineering and product teams speak a shared language around telemetry.