Skip to main content

COMPEL Glossary / offline-evaluation

Offline evaluation

Assessment of an AI system against static datasets — training hold-out, validation set, benchmark corpus — without exposure to live user traffic.

What this means in practice

Required before any online rollout because offline experimentation catches catastrophic regressions cheaply, but offline-only signals do not reliably predict online behavior.

Synonyms

offline test , batch evaluation , held-out evaluation

See also

  • Online evaluation — Assessment of an AI system under live traffic using randomized or sequential experimental designs — A/B test, multi-armed bandit, canary, or interleaving.
  • Data leakage — Information from the test or validation set inadvertently entering training — through preprocessing, feature engineering, target encoding, or time-ordered splits — inflating offline metrics and producing over-optimistic ship decisions.
  • AI experiment — A structured comparison producing evidence for a decision — about a model version, a prompt, a feature set, a retrieval strategy, or a deployment change.

Related articles in the Body of Knowledge