Skip to main content

COMPEL Glossary / model-routing

Model routing

A pattern that routes each request to the cheapest model capable of handling it, escalating to more powerful models only when necessary — typically via a small classifier, confidence-based escalation, or response evaluation.

What this means in practice

Governance concerns include reproducibility (same query may hit different models), audit-trail completeness, and fairness (escalation must not correlate with protected attributes).

Synonyms

request routing , cascade routing , model cascade

See also

  • Serving pattern — The architectural shape of the inference path — managed API, cloud-platform hosted, self-hosted online, self-hosted batch, or edge.
  • Model selection framework — An eight-criterion decision framework — capability, cost, latency, data residency, customization, operational maturity, exit cost, and license — for choosing a foundation model for a given use case.
  • Prompt caching — An inference optimisation that caches the attention key-value state for a prompt prefix so that subsequent requests sharing the same prefix skip re-processing.