OKRs for Engineering | Measurement Operating Model

What OKRs Are Actually For

John Doerr introduced OKRs to Google in 1999, having learned them from Andy Grove at Intel. The original intent was straightforward: create a system where ambitious goals are set, measurable results define success, and people have the autonomy to decide how to achieve them. The Objective answers "what are we trying to achieve?" - it is qualitative, directional, and inspiring. The Key Results answer "how will we know we achieved it?" - they are quantitative, specific, and time-bound.

The critical feature of the system as Grove and Doerr conceived it: OKRs are not performance management. They are not the basis for compensation decisions. They are set ambitiously - Google famously targeted 70% achievement as a sign of appropriate stretch. An OKR that is always 100% achieved is set too conservatively. One that is consistently at 0% is disconnected from reality.

Most OKR implementations in engineering organisations violate one or more of these principles. OKRs become annual targets tied to reviews, which kills the stretch. They become task lists rather than outcome statements, which eliminates the autonomy. They are set at the top and cascaded down without negotiation, which destroys ownership. The result is a goal-setting theatre that consumes significant organisational time and produces minimal benefit.

Why Engineering OKRs Are Harder Than Product OKRs

Product OKRs are difficult but tractable. "Increase trial conversion rate from 8% to 12%" is a clear outcome statement that the product team can work toward. The metric exists, it is measurable, and product decisions clearly influence it.

Engineering OKRs are harder for several reasons.

The outcomes engineering cares about are often hard to quantify. "Improve system reliability" is an objective that every engineering leader wants, but translating it into a key result requires a specific metric (what reliability measure?), a baseline (what is it now?), and a target (what improvement are we aiming for?). Getting all three right takes effort, and the answers are not always obvious.

Engineering work often exists in service of other teams' outcomes rather than as an end in itself. The platform team's job is to make product teams faster. How do you write a key result for "make other people faster"? You need to measure the thing that changes for others, which requires data you may not have and influence you may not fully control.

Engineering investments often have delayed payback. A decision to invest in automated testing infrastructure in Q1 may not show up as improved change failure rate until Q2 or Q3. OKR quarterly cycles can be too short to capture the causal relationship between engineering investment and engineering outcomes.

Output OKRs vs Outcome OKRs

The most common failure mode in engineering OKRs is writing key results that describe outputs (things you will do or build) rather than outcomes (changes in the state of the world).

Output key result: "Complete migration of authentication service to OAuth 2.0 by end of quarter." This describes work. Either you did it or you did not. There is no signal about whether it mattered.

Outcome key result: "Reduce login failures attributed to authentication from 2.3% to 0.5% by end of quarter." This describes a change that the authentication migration should produce. If the migration is complete but login failures have not improved, the migration did not solve the problem.

The discipline of writing outcome key results is uncomfortable because it removes the safety of activity. With an output key result, you can succeed by completing the task regardless of the impact. With an outcome key result, you are accountable for the impact. This is the point - but it requires courage from the engineering team and trust from leadership that a missed outcome target will be treated as a learning signal rather than a failure.

Practical test for output vs outcome: ask "could this key result be achieved without the problem being solved?" If yes, it is probably an output key result. Rewrite it to capture the change in state that solving the problem would produce.

Writing Key Results That Work for Engineering

A workable key result for engineering has five properties: it is measurable, it is specific about the metric and the target, it has a baseline (what is the current state?), it is achievable within the quarter if the right work is done, and it is clearly connected to the objective.

Examples of engineering key results that work:

For an objective of "improve the reliability of our core payment service": "Reduce 5xx error rate on the payment service from 0.8% to below 0.2%" or "Achieve 99.9% payment service availability, up from 99.5% in the previous quarter."

For an objective of "accelerate time to production for engineering teams": "Reduce mean deployment lead time across product teams from 4 days to under 1 day" or "Reduce p95 CI pipeline duration from 42 minutes to under 15 minutes."

For an objective of "improve the security posture of our software supply chain": "Reduce critical and high vulnerability age-to-remediation from a median of 45 days to under 14 days" or "Achieve 100% of production services running with signed container images."

Notice that each of these starts from a baseline, names a specific metric, and sets a clear target. The baseline is important - without it, the key result is either trivially achievable (if the current state is already close to the target) or the progress cannot be measured.

Cascading Without Destroying Autonomy

OKRs are often cascaded from company level to team level in a way that defeats the purpose. Senior leadership sets the company OKRs. Each function then sets OKRs that align to those. Each team then sets OKRs that align to their function. By the time the OKRs reach engineering teams, they are so heavily constrained by the levels above that the team has no real choice about what to pursue.

The alternative is alignment rather than cascade. Company OKRs set direction. Function OKRs (engineering OKRs) commit to specific contributions to that direction. Team OKRs identify what the team will do to contribute to the function OKRs. At each level, there is genuine choice about how to contribute, not just a decomposition of the level above.

This requires trust. Senior leadership needs to trust that engineering teams, given a clear direction, will choose the right things to work on. Engineering teams need to take that responsibility seriously - alignment means committing to contribution, not just activity.

The test of genuine alignment: if a team's OKRs were all achieved, would it meaningfully advance the function OKRs? If the function OKRs were all achieved, would it meaningfully advance the company OKRs? If the answer is no, the alignment is nominal rather than real.

Common Failure Modes

OKRs as task lists: "Complete A, B, and C" is a delivery plan, not an OKR. Treat it as a delivery plan.

Sandbagging: setting key results conservatively so that 100% achievement is guaranteed. This emerges when OKRs are tied to performance reviews. Address it by separating OKRs from compensation conversations, explicitly.

Too many objectives: three to five objectives per quarter per team is the maximum before focus is lost. Most organisations try to capture everything in OKRs and end up with fifteen objectives that nobody can remember.

OKRs as external commitments: agreeing with a stakeholder that your engineering team will deliver a specific feature as an OKR. This turns OKRs into project plans and eliminates the team's ability to adapt when they learn something new.

Forgetting to check in: OKRs set in January that are not reviewed until April have no value. A monthly or bi-weekly check-in that asks "what is our current confidence in achieving this key result, and what is blocking us?" is the mechanism that makes OKRs operational.

Making the Quarterly Cycle Useful

The OKR quarterly cycle has four components: setting, check-ins, scoring, and retrospective.

Setting: two to three weeks before the quarter starts. Objectives are drafted, debated, and refined. Key results are written against baselines with specific targets. Dependencies on other teams are surfaced and resolved where possible.

Check-ins: monthly or bi-weekly. Short. Focused on current confidence level (0-100%), what has changed since the last check-in, and what is blocking progress. The check-in is not a status report - it is a conversation about whether the plan is still right.

Scoring: at the end of the quarter. Rate each key result on a 0-1 scale. 0.7 is the target for stretch OKRs. Score without apology or defensiveness. A 0.4 is not a failure - it is information.

Retrospective: what did we learn? Which key results were badly calibrated (too easy or too hard)? Which objectives turned out to be less important than we thought? Which working assumptions were wrong? Feed this into the next quarter's setting process.

The quarterly cycle should take no more than three to four person-days of effort per team per quarter - the setting period, the check-ins, scoring, and retrospective combined. If OKRs are taking more than this, the process is over-engineered.

← Previous DORA and Delivery Metrics Next → Team Health Metrics