Skip to main content

COMPEL Glossary / ai-experiment

AI experiment

A structured comparison producing evidence for a decision — about a model version, a prompt, a feature set, a retrieval strategy, or a deployment change.

What this means in practice

Spans four modes: offline (static data), online (live traffic), shadow (production traffic but outputs not exposed), and adversarial (adversarial probing). Distinct from general software experimentation because AI experiments must account for statistical variance, data leakage, and distribution shift.

Synonyms

ML experiment , model experiment

See also

  • Offline evaluation — Assessment of an AI system against static datasets — training hold-out, validation set, benchmark corpus — without exposure to live user traffic.
  • Online evaluation — Assessment of an AI system under live traffic using randomized or sequential experimental designs — A/B test, multi-armed bandit, canary, or interleaving.
  • Red-team experiment — An adversarial experiment designed to probe failure modes rather than validate desired behavior — structured, hypothesis-driven exploration of safety bypass, goal mis-specification, jailbreak, and harm.

Related articles in the Body of Knowledge