Skip to main content

COMPEL Glossary / simulation-harness

Simulation harness

A virtual environment for agent evaluation without production side effects — mock tools, synthetic data, deterministic scenarios.

What this means in practice

Allows safe red-teaming, regression testing, and capability evaluation before a candidate agent reaches shadow or production traffic.

Synonyms

agent simulation harness , sandbox evaluation harness

See also

  • Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
  • Shadow traffic — A deployment pattern in which a new model or prompt version receives a copy of live traffic and produces outputs that are captured for evaluation but not returned to users.
  • Red-team experiment — An adversarial experiment designed to probe failure modes rather than validate desired behavior — structured, hypothesis-driven exploration of safety bypass, goal mis-specification, jailbreak, and harm.

Related articles in the Body of Knowledge