COMPEL Glossary / load-balancing
Load Balancing
Load balancing distributes incoming requests across multiple servers or model instances to prevent overload, ensuring consistent performance and high availability.
What this means in practice
For AI systems in production, it is essential because inference requests arrive in unpredictable bursts. Strategies include round-robin, least-connections, and weighted algorithms. In COMPEL, load balancing is part of scalability architecture in Module 3.3, Article 6.
Related Terms
Other glossary terms mentioned in this entry's definition and context.