COMPEL Glossary / auto-scaling
Auto-scaling
Auto-scaling is the automatic adjustment of computing resources, such as servers, containers, or GPU instances, based on real-time demand patterns.
What this means in practice
When an AI system experiences increased traffic, auto-scaling adds resources to maintain performance; when demand drops, it reduces resources to minimize costs. For organizations running AI services in production, auto-scaling is essential because AI workloads often have highly variable demand patterns, from burst inference requests during business hours to minimal traffic overnight. In COMPEL, auto-scaling is part of the scalability and performance architecture covered in Module 3.3, Article 6, where it is designed as a component of the enterprise AI platform during the Technology pillar assessment and implementation.
Why it matters
AI workloads often have highly variable demand patterns — burst inference during business hours, minimal traffic overnight, and unpredictable spikes during events. Without auto-scaling, organizations either over-provision infrastructure (wasting money) or under-provision it (degrading AI service quality). Auto-scaling ensures AI services maintain performance during demand spikes while minimizing costs during quiet periods, making AI operations economically sustainable at scale.
How COMPEL uses it
Auto-scaling is part of the Technology pillar's scalability and performance architecture, assessed during the Calibrate stage and designed during Model as a component of the enterprise AI platform. During Produce, auto-scaling is implemented and configured for each AI service based on its demand profile. The Evaluate stage monitors whether auto-scaling is effectively maintaining service levels while optimizing costs, feeding findings into AI FinOps analysis.
Related Terms
Other glossary terms mentioned in this entry's definition and context.