Ragan McGill

Practice : Auto-scaling Infrastructure

Purpose and Strategic Importance

Auto-scaling Infrastructure is the practice of dynamically adjusting compute, storage, or service capacity based on demand. It improves system resilience, ensures performance under load, and optimises cost by scaling down when demand drops.

Auto-scaling is critical for delivering consistent customer experiences, avoiding outages during traffic spikes, and reducing over-provisioning. It supports modern architectures and cloud-native practices by making infrastructure adaptive, responsive, and efficient.

Description of the Practice

Resources scale automatically in response to metrics such as CPU, memory, request rate, or custom signals.
Scaling can be vertical (resizing instances) or horizontal (adding/removing instances).
Auto-scaling policies are defined and managed in infrastructure code.
Scaling decisions are observable and logged for audit and tuning.
Services maintain high availability and performance across varying loads.

How to Practise It (Playbook)

1. Getting Started

Identify services that experience variable load or performance degradation under stress.
Use platform-native auto-scaling features (e.g. AWS ASG, Azure VMSS, Kubernetes HPA).
Set scaling policies based on observed metrics and performance thresholds.
Validate scaling logic in test environments under simulated load.

2. Scaling and Maturing

Implement predictive scaling using historical data and usage trends.
Introduce cooldown periods to prevent rapid scale-in/scale-out oscillations.
Combine auto-scaling with load balancing and health checks.
Use observability tools to visualise scaling events, cost impact, and system performance.
Version auto-scaling policies and test them as part of infrastructure pipelines.

3. Team Behaviours to Encourage

Review scaling performance during incident and performance reviews.
Align scaling thresholds with performance SLOs and cost targets.
Encourage experimentation to fine-tune policies for different workloads.
Share lessons learned about scaling patterns across teams.

4. Watch Out For…

Overly aggressive scaling policies causing instability or excess cost.
Undetected bottlenecks that auto-scaling cannot resolve (e.g. DB limits).
Manual overrides that bypass scaling logic and introduce drift.
Lack of visibility into how and why scaling decisions are made.

5. Signals of Success

Services maintain performance during traffic spikes without manual intervention.
Infrastructure costs are optimised relative to load and usage.
Scaling activity is visible, predictable, and tuned over time.
Auto-scaling is integrated into release planning and infrastructure design.
Teams design systems with elasticity as a core principle.