COMPEL Glossary / inference
Inference
Inference is the process of using a trained AI model to make predictions or generate outputs on new, previously unseen data.
What this means in practice
While training is a computationally intensive one-time (or periodic) activity, inference happens continuously whenever the deployed model processes a request -- scoring a transaction, generating a response, or classifying an image. Inference costs are a significant and often underestimated component of AI economics: cloud-based inference endpoints for LLMs can cost dollars per thousand requests, and high-volume applications can generate substantial monthly bills. Understanding the distinction between training and inference helps transformation leaders allocate budgets correctly and design appropriate infrastructure. COMPEL's AI FinOps practices include monitoring inference costs as a key operational metric.
Why it matters
Inference costs are a significant and often underestimated component of AI economics. While training is periodic, inference happens continuously every time a deployed model processes a request. Cloud-based inference for LLMs can cost dollars per thousand requests, and high-volume applications generate substantial monthly bills. Organizations that budget only for training without accounting for ongoing inference costs face recurring budget surprises.
How COMPEL uses it
Understanding inference economics is essential during the Model stage when building business cases for AI use cases. COMPEL's AI FinOps practices include monitoring inference costs as a key operational metric during Produce. The Evaluate stage tracks inference cost efficiency as part of the value realization framework, ensuring that the cost of generating predictions remains proportionate to the business value they deliver.
Related Terms
Other glossary terms mentioned in this entry's definition and context.