Team Topologies | System Thinking & Design

The Core Idea

Most organisations design teams reactively. A new product initiative needs people, so people are assigned. A shared capability needs ownership, so a team is formed around it. A reorganisation happens, and teams are shuffled according to whatever logic prevailed in the room at the time. The result, accumulated over years, is a patchwork of team structures that nobody designed and that nobody fully understands.

Team Topologies, the framework developed by Matthew Skelton and Manuel Pais and published in 2019, offers something different: a small, stable vocabulary for intentionally designing team structures. It does not prescribe a specific org structure. It provides the building blocks - four team types and three interaction modes - from which coherent structures can be composed.

The insight behind Team Topologies is that Conway's Law cannot be wished away. If you want a particular architecture - one that enables fast flow, independent deployability, and clear ownership - you have to design your team structure to produce it. Team Topologies gives you the language to do that design work explicitly, rather than letting it happen by default.

The vocabulary matters because org design conversations, in most organisations, are imprecise. People talk about "ownership" without specifying what kind of ownership. They talk about teams "collaborating" without specifying what collaboration should produce or when it should stop. They talk about "platform teams" without distinguishing between a platform that enables self-service and a platform that creates dependency. Imprecise vocabulary leads to imprecise structures, and imprecise structures lead to the coordination overhead and delivery friction that afflict most engineering organisations.

The Four Team Types

Stream-Aligned Teams

A stream-aligned team is aligned to a single, continuous flow of work from a segment of the business. That flow might be a product, a feature area, a user journey, or a value stream. The key characteristic is that the team can deliver value to the end user - or to the next downstream team - without requiring long-running coordination with other teams.

Stream-aligned teams are the primary team type. Everything else in the Team Topologies model exists to support stream-aligned teams. The goal is to maximise the number of stream-aligned teams and minimise the overhead on them.

A stream-aligned team in an engineering organisation typically owns the full stack for its domain: the front end, the back end, the data layer, the CI/CD pipeline, and the production operations for its services. It can write, test, deploy, and monitor its own changes. It does not need to wait for another team to complete its work before it can deliver.

What this looks like in practice: a team that owns a checkout flow in an e-commerce platform. They own the UI, the API, the payment service integration, the order data model, and the alerting for their services. When the business wants to change the checkout experience, this team can design, build, test, and deploy that change without coordinating across five other teams.

When to use it: for every product or feature area that has a reasonably stable, identifiable flow of value. Most teams in a well-designed engineering organisation should be stream-aligned.

What goes wrong: stream-aligned teams that have too many dependencies on other teams cease to be truly stream-aligned. They become coordinating teams - spending more time managing hand-offs than delivering. This is the most common failure mode, and it is almost always caused by scoping the team's ownership too narrowly without also limiting what the team depends on.

Enabling Teams

An enabling team works with stream-aligned teams to help them acquire new capabilities. They are not there to do the work for stream-aligned teams. They are there to raise the capability of stream-aligned teams so that those teams can do the work themselves.

An enabling team might work with engineering teams to help them adopt a new testing strategy, implement security controls, improve their observability practices, or migrate to a new infrastructure pattern. Critically, the enabling team's goal is to make itself unnecessary to that team for that capability. They work intensively, transfer knowledge, help establish practices, and then move on.

This is a fundamentally different mode of working from how many "centre of excellence" or "enablement" teams actually operate. In practice, these teams often become permanent dependencies - teams go to them to get things done rather than to learn how to do things themselves. That is an enabling team that has drifted into a different shape: either a shared service or a disguised stream-aligned team. Both are problems.

What this looks like in practice: a security enabling team that runs a six-week engagement with a product team, helping them implement threat modelling, secure coding review, and automated security testing in their pipeline. At the end of the engagement, that product team can do this work themselves. The enabling team moves to the next team that needs the same capability.

When to use it: when stream-aligned teams need to acquire a capability that is too complex or too specialised to learn quickly on their own, and when that capability needs to be distributed across many teams rather than centralised.

What goes wrong: enabling teams that don't exit. They either become bottlenecks (everyone depends on them) or they grow to replace team capability rather than build it.

Complicated Subsystem Teams

A complicated subsystem team owns a subsystem that requires deep specialist knowledge to build and maintain - knowledge that it would be unreasonable to expect every stream-aligned team to acquire. The team exists to manage the complexity of the subsystem so that stream-aligned teams can use it without having to understand it fully.

The key word is "complicated" in the sense of Cynefin: a problem that has a right answer, but reaching it requires expertise that is not widely distributed. Video codec implementation. A real-time trading engine. A bespoke machine learning inference pipeline. A cryptography library. These are not complicated because someone made them complicated. They are complicated because the underlying domain is inherently specialist.

Complicated subsystem teams should be rare. They exist only where the subsystem is genuinely too deep for a stream-aligned team to own, and only where the complexity cannot be abstracted away by a platform. If you find yourself with many complicated subsystem teams, that is a signal that you have overcomplicated your system, underinvested in your platform, or misidentified what "complicated" means in your context.

What this looks like in practice: a team that owns the real-time recommendation engine for a consumer platform. The engine uses complex machine learning models, requires careful performance tuning, and has a mathematics-heavy optimisation layer. Stream-aligned product teams use the engine via a clean API. They do not need to understand the internals. The complicated subsystem team owns the internals and exposes the capability as a service.

When to use it: only when the subsystem genuinely requires specialist knowledge at a level that cannot be distributed, and when abstracting it into a platform service is the appropriate interaction model.

What goes wrong: teams classified as complicated subsystem teams when they are actually just legacy systems that nobody has invested in making maintainable. The classification is used to justify keeping a specialist team rather than to solve a genuine complexity problem.

Platform Teams

A platform team owns the internal platform - the self-service capabilities that stream-aligned teams use to build, deploy, and operate their software. The platform is not a set of tools that are administered by the platform team on behalf of others. It is a product, designed for an internal audience, that reduces the cognitive load on stream-aligned teams by abstracting away infrastructure complexity.

Platform teams think of stream-aligned teams as their customers. They apply product thinking to internal tooling: they have roadmaps, they gather requirements, they measure adoption and satisfaction, and they make trade-offs based on what gives stream-aligned teams the most leverage. They do not operate as a ticket queue. They do not do the work for stream-aligned teams. They build capabilities that stream-aligned teams can use independently.

A good platform dramatically reduces the time a stream-aligned team spends on undifferentiated heavy lifting - provisioning infrastructure, configuring logging, setting up pipelines, managing secrets. A bad platform adds a new layer of complexity on top of the existing infrastructure complexity. The difference is almost entirely about whether the platform team has adopted a product mindset.

What this looks like in practice: a platform team that owns an internal developer platform. Stream-aligned teams use a web UI or CLI to provision services, configure pipelines, set up observability dashboards, and manage deployments - without filing tickets. The platform team monitors which capabilities are underused, which cause friction, and which are missing entirely, and they prioritise accordingly.

When to use it: whenever the infrastructure and tooling complexity is high enough that stream-aligned teams are spending significant time on it rather than on their own domain problems.

What goes wrong: platform teams that become gatekeepers rather than enablers. They own the infrastructure but require manual intervention for every change. They have a ticket queue, not a product. They measure their own throughput rather than the impact on stream-aligned team velocity.

The Three Interaction Modes

Collaboration

Collaboration is the mode where two teams work closely together on a shared problem, typically for a bounded period of time. Both teams contribute actively. They share knowledge, make decisions jointly, and discover things about the problem that neither team could discover alone.

Collaboration is intensive. It has high communication overhead. It should be used when that overhead is worth it - typically when there is genuine uncertainty about how to solve a problem, when two teams have complementary capabilities that need to be combined, or when a new interface or capability is being established that will later be handed off to one team.

Collaboration should not be the permanent mode of interaction between any two teams. If two teams are permanently collaborating, either the problem has not been properly decomposed, or one team is doing work that belongs in the other team, or the boundary between them is wrong. Collaboration is an investment with an expected return: at the end of a collaborative phase, one or both teams should know something they didn't know before, and the interaction mode should shift.

What this looks like in practice: a platform team and a stream-aligned team collaborating for eight weeks to design and implement a new deployment capability. The platform team understands the infrastructure. The stream-aligned team understands the deployment workflow they need. Together, they design an API and a set of primitives that the stream-aligned team can use independently going forward. At the end of the eight weeks, the interaction mode shifts to X-as-a-Service.

When to use it: at the start of a new capability, when designing a new interface, or when two teams are learning from each other to solve a genuinely novel problem.

What goes wrong: permanent collaboration that never shifts to a more defined interaction mode. Teams that are always collaborating have no stable interface between them. Ownership is unclear. Work falls through the gaps. The coordination overhead is high and ongoing.

X-as-a-Service

X-as-a-Service is the mode where one team provides a capability - an API, a tool, a service, a library - and another team consumes it with minimal interaction. The consumer team does not need to understand how the capability works internally. They just use it.

This is the interaction mode that enables fast flow at scale. Stream-aligned teams can move quickly when they can consume platform capabilities, shared services, and complicated subsystem outputs without coordination overhead. The key requirement is that the API or interface must be genuinely self-service: clear documentation, stable contracts, and sufficient capability that the consumer team rarely needs to ask for something that isn't there.

X-as-a-Service requires investment from the providing team. A platform that requires constant interaction to use is not really X-as-a-Service - it is a shared service with extra steps. The providing team must make deliberate product decisions about what to expose, how to document it, and how to ensure it meets consumer needs without requiring ongoing hand-holding.

What this looks like in practice: stream-aligned teams consuming observability tooling from a platform team. They provision a new service, the observability configuration is applied automatically. Dashboards are generated from convention. Alerts fire based on standard templates. The stream-aligned team never opens a ticket with the platform team. They read the documentation, configure what they need in code, and move on.

When to use it: once an interface is well-understood and stable, and the consumer team's needs are well-served by the current capability without constant change.

What goes wrong: treating X-as-a-Service as the default mode for a capability that is not yet well enough understood to have a stable API. Consumer teams hit gaps in the capability and have no way to address them. They work around the platform rather than through it.

Facilitating

Facilitating is the mode used by enabling teams. They work with stream-aligned teams to help them adopt new capabilities, but they do not do the work for them. They teach, demonstrate, review, pair, and coach. The output is an increase in the stream-aligned team's capability, not a deliverable that the enabling team produced.

The interaction is time-bounded. It ends when the stream-aligned team has the capability. The enabling team then moves to the next team that needs help.

Facilitating is the interaction mode that enables capability distribution at scale without creating permanent dependency on specialist teams. It is how an organisation raises the floor of engineering practice across many teams simultaneously.

What this looks like in practice: an engineering effectiveness team running a structured engagement with a product team to improve their test automation coverage. They pair with engineers, review test strategies, introduce tooling, and run workshops. After six weeks, the product team is writing meaningful tests independently. The enabling team exits and moves to the next team.

When to use it: when stream-aligned teams need to acquire a new practice or capability, and when the goal is for the team to own that practice independently afterwards.

What goes wrong: enabling teams that facilitate indefinitely, or that facilitate without a clear definition of "done." The engagement becomes open-ended, the enabling team becomes a permanent fixture, and the capability is never truly transferred.

Cognitive Load as a Design Constraint

Team Topologies introduces cognitive load as the primary constraint for team sizing and scope definition. This is the most practically useful thing about the framework, and the concept most organisations miss entirely.

Cognitive load - the total mental effort required to understand and operate a system - has three components:

Intrinsic load is the inherent complexity of the domain. Understanding how a distributed transaction works, or how a machine learning pipeline operates, or how a financial instrument is priced - this is intrinsic complexity that cannot be removed. You can manage it (by hiring people who already understand it, by investing in education, by reducing unnecessary scope) but you cannot eliminate it.

Extraneous load is complexity imposed by the environment and tooling - complexity that is not inherent to the problem being solved. Having to understand eight different deployment mechanisms. Navigating an undocumented legacy codebase. Waiting for a ticket queue to get access to infrastructure. Coordinating with three other teams before you can make a change. This load can and should be reduced, primarily through investment in good platforms and clear team interfaces.

Germane load is the cognitive effort of learning - developing new mental models, acquiring new skills. This is productive load. It is the investment you want teams to be making.

The practical implication for team design is direct. A team has a finite cognitive load capacity. If a team is responsible for twelve microservices across four domains, the intrinsic load of understanding all of them at depth may exceed that capacity. If the platform is poor and the tooling is inconsistent, extraneous load consumes capacity that should go toward the actual problem. If the team is always firefighting, there is no capacity left for germane load - no learning, no improvement, no growth.

Use cognitive load as a design constraint. When scoping what a team should own, ask whether a team of 6-8 competent engineers can hold the full system in their heads and operate it confidently. If the answer is no, the scope is too large. If the answer is yes but it's only yes because the team has been there for years and nobody else could do it, that is a different problem - a knowledge concentration risk, not a sign that the scope is appropriate.

Common Patterns and Pitfalls

Teams mistyped as stream-aligned when they are actually functional.

A "front-end team" or a "QA team" or a "data team" is not a stream-aligned team. It is a functional team - aligned to a discipline rather than to a flow of value. Functional teams have high coordination overhead with adjacent teams. They cannot deliver end-to-end value independently. They are almost always the wrong shape for a fast-flow engineering organisation.

Interaction modes left undefined.

When two teams have to work together but nobody has defined how - whether they're collaborating, providing or consuming a service, or facilitating - the default is ad hoc coordination. This looks like: Slack messages at inconvenient times, meetings that produce no clear output, deliverables that don't meet the consumer's needs because requirements were never properly articulated. Define the interaction mode explicitly, and revisit it periodically.

Platform teams without product thinking.

A platform team that operates as a ticket queue is not a platform team in the Team Topologies sense. It is a shared service bottleneck that has been given a better name. The test is whether stream-aligned teams can get what they need from the platform without involving the platform team in the work. If they can, the interaction mode is X-as-a-Service. If they can't, it isn't.

Enabling teams that never exit.

An enabling team that runs permanent engagements with the same teams has either failed to transfer the capability or is doing work that belongs in a stream-aligned team. If the capability cannot be transferred, ask why. Is the capability genuinely too complex? Is the tooling too poor? Is the stream-aligned team too small to sustain the practice without dedicated support? Each of these has a different answer, but none of them is "keep the enabling team running this indefinitely."

How to Apply It

Step 1: Map your current teams and classify them.

For each team in your engineering organisation, ask: what does this team actually produce? Who consumes it? Does the team have the capability to deliver value end-to-end without significant coordination with other teams? Use these answers to classify each team against the four types. Most teams in most organisations are not cleanly typed. That is useful information.

Step 2: Identify the interaction modes currently in use.

For each pair of teams that regularly interact, ask: is this collaboration, X-as-a-Service, or facilitating? Or is it something else - a shared service bottleneck, an unresolved dependency, an ad hoc coordination pattern that has become habit? Map the actual interaction modes, not the intended ones.

Step 3: Assess cognitive load.

For each team, ask honestly: is the scope of what this team owns appropriate for a team of this size? Where is cognitive load concentrated? Which teams are permanently overloaded? Which are underutilised? Use the signals described in the cognitive load section - long onboarding times, high defect rates, frequent context switching, low deployment frequency - to calibrate.

Step 4: Design toward the target.

Using Conway's Law, the four team types, and the three interaction modes as your vocabulary, design the target state. Where do you want stream-aligned teams? What platforms do they need to be truly independent? Where do you need enabling teams to lift the capability floor? Where does genuine subsystem complexity require a specialist team?

Step 5: Sequence the transition.

You cannot move from a functional team structure to a stream-aligned structure in one step. Identify the highest-leverage changes - the places where reclassifying a team or changing an interaction mode would have the greatest impact on flow - and sequence them. Be explicit about what changes at each step and what is being preserved.

Connection to Your Operating Model

Team Topologies does not stand alone. It is a framework for applying Conway's Law deliberately, which means it depends on having a clear target architecture - a clear sense of what "fast flow" means in your context, what "independent deployability" requires, and what the value streams are that teams should be aligned to.

It connects directly to value stream thinking. Stream-aligned teams are aligned to value streams. Defining those value streams - mapping them, measuring them, understanding where value waits - is the prerequisite for knowing how to type and size your stream-aligned teams.

It connects to cognitive load as an ongoing design constraint. Team Topologies provides the vocabulary for reducing extraneous cognitive load at the organisational level. The platform team exists specifically to abstract complexity away from stream-aligned teams. The enabling team exists to raise the capability floor so that intrinsic complexity is better distributed. Monitoring cognitive load signals is how you know when your team design needs revisiting.

And it connects to the broader principle that most problems in engineering organisations are system problems, not people problems. Team Topologies does not ask you to find better people. It asks you to design better structures. The structures determine the communication patterns. The communication patterns determine the architecture. The architecture determines the outcomes. Design the structures deliberately, and the outcomes follow.

← Previous Conway's Law Next → Value Stream Thinking