Skip to main content

COMPEL Glossary / load-balancing

Load Balancing

Load balancing distributes incoming requests across multiple servers or model instances to prevent overload, ensuring consistent performance and high availability.

What this means in practice

For AI systems in production, it is essential because inference requests arrive in unpredictable bursts. Strategies include round-robin, least-connections, and weighted algorithms. In COMPEL, load balancing is part of scalability architecture in Module 3.3, Article 6.

Related Terms

Other glossary terms mentioned in this entry's definition and context.