• Home
  • BVSSH
  • Engineering Enablement
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Practice : Custom Metrics Instrumentation

Purpose and Strategic Importance

Custom Metrics Instrumentation enables teams to capture business-specific and system-specific telemetry that off-the-shelf metrics don't provide. These metrics help surface meaningful performance insights, uncover user behaviour patterns, and support fine-grained monitoring for critical workflows.

By instrumenting what truly matters, teams build smarter alerts, detect issues faster, and make better decisions based on precise, relevant data.


Description of the Practice

  • Custom metrics are numeric time-series data points developers explicitly add to code to monitor important aspects of system behaviour or business value.
  • They include counters, gauges, histograms, and timers related to domain-specific events (e.g. “orders placed,” “checkout errors,” “email sends per region”).
  • Collected via tools like Prometheus, OpenTelemetry, StatsD, or cloud-native APMs (e.g. AWS CloudWatch, Azure Monitor, Datadog).
  • Metrics are exported, visualised, and queried for insights, trend detection, and anomaly alerting.

How to Practise It (Playbook)

1. Getting Started

  • Identify critical workflows or KPIs that are not visible in default metrics.
  • Instrument application code to emit custom metrics at key events and state changes.
  • Use structured metric naming and tags (e.g. service, environment, region) for aggregation and filtering.
  • Export metrics to a central observability platform with a clear retention policy.

2. Scaling and Maturing

  • Pair custom metrics with alerting rules, dashboards, and annotations (e.g. deployments, incidents).
  • Build SLIs from custom metrics (e.g. “successful payments per minute”) to support SLOs.
  • Create business-level observability - not just infrastructure metrics - to link technical health to outcomes.
  • Document all metrics: what they mean, where they come from, and how to interpret them.
  • Continuously prune unused metrics to manage cost and reduce noise.

3. Team Behaviours to Encourage

  • Think beyond infrastructure - log what customers care about.
  • Collaborate with product, ops, and business teams on what to measure.
  • Use metrics in sprint reviews, post-incident analysis, and decision-making forums.
  • Keep metrics consistent, portable, and easy to understand.

4. Watch Out For…

  • Metric explosion - too many dimensions or duplicates driving up cardinality and cost.
  • Lack of standardisation - inconsistent naming, units, or labelling.
  • Instrumenting only technical components, ignoring user or business metrics.
  • Writing metrics but not using them in operational or strategic discussions.

5. Signals of Success

  • Teams have clear, reliable metrics that reflect user and system behaviour.
  • Alerting based on metrics leads to timely, actionable responses.
  • Product and engineering decisions are informed by real-time usage patterns.
  • Incidents are resolved faster with clearer visibility into root causes.
  • Custom metrics become part of delivery best practices, not an afterthought.
Associated Standards
  • Customer feedback is continuously gathered and acted on
  • Operational readiness is tested before every major release
  • Product and engineering decisions are backed by live data
  • Systems expose the data needed to understand their behaviour
  • Teams are alerted when feedback loops are broken

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering