Systems vs Individuals | System Thinking & Design

The Fundamental Attribution Error in Engineering Orgs

In social psychology, the fundamental attribution error describes a consistent bias in human judgment: when things go wrong, we attribute the cause to the character or capability of the person involved, while underweighting the situational and environmental factors that constrained their choices. When someone is late to a meeting, we think they are disorganised. We forget that the previous meeting ran over, that the building's lifts are broken, and that they were given two minutes' notice.

In engineering organisations, this error operates at scale and with serious consequences.

A team misses a deadline. The first question is: who is responsible? Not: what in the system made the deadline unrealistic, or what dependencies were not resolved, or what planning assumptions turned out to be wrong. Who is responsible. The answer, almost invariably, focuses on the team lead or the product manager. The structural factors - unclear requirements, unrealistic estimates caused by commitments made without engineering input, a shared dependency that was deprioritised by another team - are secondary. The individual is primary.

An incident occurs. The post-mortem identifies a sequence of events. Somewhere in that sequence, an engineer made a decision that, in hindsight, made the incident worse. The report concludes that the engineer "made an error." The recommendation is additional training, or improved checklists, or a reminder to follow the process. It does not ask why the engineer had insufficient information to make a better decision, why the system provided no guardrails against the failure mode, or why the deployment process did not include an automatic rollback mechanism.

A team's velocity is low. The performance review flags the team as underperforming. A coaching plan is put in place. Nobody measures the deployment pipeline reliability, the interrupt rate from production issues, the number of cross-team dependencies blocking the team's work, or the quality of the platform the team depends on. The assumption is that the team is the variable. It is usually not the most important variable.

This is not a minor cognitive quirk. It shapes every significant decision about hiring, performance management, team restructuring, and engineering investment. When the diagnosis is consistently wrong - when "people problem" is the default conclusion regardless of the actual evidence - the interventions are consistently ineffective. You train people who needed better tooling. You replace individuals who needed better processes. You coach team leads who needed clearer ownership models. And the underlying problems remain.

What "The System" Actually Means

"The system" is not a vague abstraction. It is the specific, observable set of conditions within which engineers and teams do their work. When those conditions make good outcomes unlikely, good outcomes will be rare - regardless of individual quality.

The system includes:

Processes. How work is initiated, reviewed, prioritised, approved, deployed, and monitored. A deployment process that requires manual steps by two specific people is a system factor. A change approval process that takes two weeks is a system factor. A sprint planning process that commits to work before requirements are understood is a system factor. Engineers operate within these processes. They did not design them and, in most cases, cannot change them unilaterally.

Incentives. What is rewarded, what is tolerated, and what is penalised. If engineers are rewarded for individual feature output and penalised for time spent on platform improvements, they will write features and neglect the platform. This is not short-term thinking - it is rational behaviour within the incentive structure. If teams are measured on their own throughput rather than on end-to-end flow, they will optimise their own throughput at the expense of the teams downstream. They are not being selfish. They are responding to the measurement.

Tooling. The quality of the tools engineers use to write, test, build, deploy, and monitor software. Poor tooling is invisible overhead. Slow CI pipelines, inconsistent local development environments, fragile test infrastructure, insufficient observability - each imposes a cognitive and time cost that does not appear in any individual's performance review. It appears in delivery lead time, in defect rates, in engineer frustration. But its cause is attributed to the team, not the tools.

Ownership structures. What does each team own? What are the boundaries? Where are the ambiguities? Unclear ownership is a system problem. When two teams both believe they have a legitimate claim on a shared component, the result is not a collaboration problem between those teams - it is an ownership design problem. The teams are responding rationally to a structural ambiguity they did not create.

Information flows. Who knows what, when? When engineers make decisions without information they needed - when a security risk was not known at design time, when a performance degradation was not visible until it became an incident, when a business constraint was not communicated until after the feature was built - those are information flow failures. They look like individual errors. They are system failures.

Feedback loops. How quickly does the system provide signal that something is wrong? A deployment pipeline that gives feedback in five minutes enables fast error detection and fast correction. A deployment process that gives feedback in two weeks - through a slow release cycle and delayed monitoring - ensures that errors are expensive. The speed and quality of feedback loops determine whether learning and improvement are possible at all.

The Evidence

This is not a theoretical position. It is one of the most consistently supported findings across quality management, safety research, and software delivery research.

Edwards Deming, the statistician and quality management pioneer who is credited with a central role in Japan's post-war industrial transformation, made this argument as directly as anyone has: approximately 94% of problems in any system are system problems. The remaining 6% are attributable to individuals. He did not present this as a hopeful estimate. He presented it as an empirical observation from decades of working on quality in industrial systems.

Deming's argument was specific. When workers are producing defects, the instinct of management is to blame the workers - to retrain them, to discipline them, to replace them. But in a well-designed production system, the workers are following the process they were given, using the tools they were provided, within the constraints they were handed. If the process produces defects, the process is the problem. If the tools are insufficiently precise, the tools are the problem. Improving the worker without improving the system will not improve the output.

The same logic applies, with equal force, to software engineering.

The Accelerate research, conducted by Nicole Forsgren, Jez Humble, and Gene Kim and published in 2018, provides the most rigorous empirical basis for this in a software engineering context. The research surveyed tens of thousands of engineers across organisations of different sizes and industries and identified the technical and cultural capabilities that predict high delivery performance. The strongest predictors of delivery outcomes - lead time, deployment frequency, change failure rate, time to restore - were practices and structures, not individual talent. Continuous integration, trunk-based development, test automation, deployment automation, loosely coupled architecture, lean product management. These are system-level factors. They predict performance more reliably than any people-level factor.

The psychological safety research, most prominently associated with Amy Edmondson's work at Harvard Business School and later applied to engineering teams through Google's Project Aristotle, found that the most important predictor of team performance is not the talent of individual team members - it is the degree to which team members feel safe to speak up, to flag problems, to admit mistakes, and to challenge assumptions. Psychological safety is a property of the team environment. It is created by leadership behaviour, by norms, by consequences for raising difficult truths. It is a system factor. It predicts outcomes that are typically attributed to individual courage or competence.

How to Shift from Individual to System Thinking

The shift is a reframe, applied consistently and deliberately. It is not about removing accountability. It is about locating accountability in the right place.

"What in the system made this outcome likely?" replaces "who made this mistake?"

When a defect reaches production, ask: what in the deployment pipeline should have caught this but didn't? What in the development process should have surfaced this but didn't? What in the team's ownership model led to this component being under-tested? What in the incentive structure made it rational for the team to ship before the quality bar was met?

These questions do not exonerate anyone. They locate the intervention where it will have the greatest impact.

"What structural conditions made this easy or hard?" replaces "does this person have the right skills?"

When a team is underperforming, audit the structural conditions before assessing the people. What dependencies does the team have? How long do those dependencies take to resolve? What is the deployment pipeline reliability? What is the interrupt rate from production issues? What cognitive load is the team carrying? What quality is the internal platform?

In a significant proportion of cases, the answer to "why is this team slow?" is found in these structural factors. The team does not have a skill problem. It has a system problem.

"What made this the path of least resistance?" replaces "why didn't they follow the process?"

When engineers consistently do something other than what the process specifies - skipping tests, bypassing code review, cutting corners on documentation - the default response is to reinforce the process. Remind people of the standard. Add a compliance check. The more useful question is: why is following the process harder than not following it? If the test suite takes forty minutes to run, engineers will skip it. If the code review tool is difficult to use, engineers will route around it. If documentation is required but nobody ever reads it, engineers will produce documentation that satisfies the requirement and contains no useful information.

Make the right thing easy. That is a system intervention. Making the right thing mandatory but hard is a process intervention. Process interventions produce compliance theatre. System interventions produce actual behaviour change.

Systemic vs Individual Interventions

There are genuine individual problems. Not everything is a system problem. The question is whether you are diagnosing correctly before reaching for an individual intervention.

When individual intervention is appropriate:

A specific engineer consistently produces work that is significantly below the quality of their peers, despite having access to the same system, the same tools, and the same support. You have ruled out cognitive load overload, unclear requirements, insufficient platform support, and inadequate feedback loops as explanations. The difference between this engineer's output and their peers' output is not explainable by structural factors. This is a genuine individual issue.

A team lead is creating a toxic environment that is driving attrition and suppressing psychological safety. You have investigated whether leadership above them is modelling the behaviour, whether the incentive structure rewards short-term output over team health, whether the performance management system creates pressure that manifests in interpersonal aggression. You've found that the structural factors don't explain the behaviour. This is a genuine individual issue.

An individual is actively undermining processes - not because the processes are wrong, but because the individual has a personal agenda that conflicts with organisational goals. This is a genuine individual issue.

When individual intervention is a distraction from a system problem:

A team is consistently slow to deliver. The response is to coach the team lead on prioritisation and planning. The real problem is that the team has eleven cross-team dependencies, a deployment pipeline that fails 30% of the time, and is responsible for three product lines simultaneously. Coaching the team lead does not address any of these factors.

Engineers are not writing tests. The response is to mandate test coverage targets and review them in code review. The real problem is that the test suite takes 45 minutes to run locally, the CI environment is flaky, and tests that catch real issues before the build are not clearly distinguished from tests that pass trivially. Mandating coverage without improving the system makes the coverage metric a game rather than a quality signal.

A team is experiencing high attrition. The response is exit interviews and an engagement survey. The real problem is that the team's work is unglamorous maintenance of a legacy system, they have no roadmap, their technical debt requests are consistently deprioritised, and there is no visible career path within the current structure. Individual engagement surveys will tell you what the individuals feel. They will not fix the structural conditions that are causing them to feel it.

The distinguishing question is: if you replaced every individual in this team or situation with equally qualified people, would the problem persist? If yes, it is a system problem. If no - if the problem is genuinely attributable to the specific individuals rather than to the conditions they are operating in - then individual intervention is appropriate.

In the vast majority of cases in engineering organisations, the answer is yes. The problem would persist. Because the problem is the system.

What This Means for Leaders

Leadership is primarily a system design function. The leader's job is not to direct individuals toward correct behaviour. It is to design the conditions under which correct behaviour is the natural outcome.

This is a significant reframe for many leaders, who were promoted for individual technical or delivery excellence and who conceive of leadership as applying that excellence at scale - making better technical decisions, holding higher standards, creating more urgency. These are individual-level interventions applied to a system-level problem. They do not scale and they do not address the root causes.

Design for conditions, not performance.

High-performing teams exist in conditions where expectations are clear, feedback is fast, tooling is good, ownership is unambiguous, dependencies are manageable, and information is available. Creating these conditions is leadership work. Monitoring individual performance within poor conditions is not.

Make the diagnosis before the intervention.

Before assuming an individual or team problem, apply the system analysis. What are the structural factors that could explain this outcome? Map the dependencies, the feedback loops, the information availability, the cognitive load, the incentive alignment. If you cannot explain the outcome through structural factors, then look at the individuals. Do not skip the structural analysis because it is harder than assigning blame.

Treat incidents and defects as system learning opportunities.

Blameless post-mortems - an approach pioneered in site reliability engineering and increasingly adopted in engineering organisations - treat incidents as system failures, not individual failures. The question is not who made the error but what in the system made the error possible and how the system can be improved to make it less likely. This approach produces better learning, better improvement, and better psychological safety than blame-oriented retrospectives.

Invest where the leverage is.

If 94% of problems are system problems, then 94% of improvement investment should be in the system. Improving tooling, reducing cognitive load, clarifying ownership, improving feedback loops, investing in the platform - these are the high-leverage investments. Individual coaching and performance management - important when genuinely needed - are the lower-leverage investment. The ratio of investment should reflect the ratio of impact.

Measure what reveals the system.

Individual performance metrics - individual commits, individual story points, individual defects - measure the leaf nodes of a system. They tell you what individuals are doing but not why. System metrics - lead time, flow efficiency, deployment frequency, change failure rate, mean time to restore - reveal the system's properties. They tell you whether the conditions for good performance exist. Optimise for system metrics, and individual performance tends to follow. Optimise for individual metrics, and the system tends to deteriorate as individuals optimise locally at the expense of the whole.

Connection to Your Operating Model

The principle that most problems are system problems is the thread that runs through every concept in this section. Conway's Law says that your architecture reflects your org structure - a system-level observation. Value stream thinking says that most lead time is wait time created by structural hand-offs - a system-level analysis. Team Topologies says that cognitive load, not individual capability, determines what a team can effectively own - a system-level design constraint. Org design anti-patterns are structural conditions that produce bad outcomes regardless of the people inside them.

If you approach delivery improvement by trying to improve individuals within a system that produces poor outcomes, you will invest significant energy and produce marginal results. If you approach it by understanding and improving the system - the structures, the processes, the tooling, the incentives, the ownership models, the feedback loops - you will invest less energy and produce more durable results.

This is not a comfortable position for leaders whose authority rests on their ability to direct people. It requires a shift from "how do I get people to do better?" to "how do I design a system in which better outcomes are the natural result?" That shift in question is the shift from individual thinking to system thinking. It is the prerequisite for everything else in this framework.

The operating model you build is a system. Design it accordingly.

← Previous Org Design Anti-Patterns