Skip to main content

COMPEL Glossary / serving-pattern

Serving pattern

The architectural shape of the inference path — managed API, cloud-platform hosted, self-hosted online, self-hosted batch, or edge.

What this means in practice

Each pattern has a characteristic cost profile, operational posture, data-residency footprint, and governance surface. Selection must precede the build-vs-buy decision.

Synonyms

AI serving architecture , inference-path pattern

See also

  • Model selection framework — An eight-criterion decision framework — capability, cost, latency, data residency, customization, operational maturity, exit cost, and license — for choosing a foundation model for a given use case.
  • Model routing — A pattern that routes each request to the cheapest model capable of handling it, escalating to more powerful models only when necessary — typically via a small classifier, confidence-based escalation, or response evaluation.
  • Data residency (AI) — The requirement that training data, retrieval data, and inference itself occur within a specified jurisdiction.
  • TTFT (time-to-first-token) — The latency from request submission to the first streamed output token.

Related articles in the Body of Knowledge