Skip to main content

COMPEL Glossary / position-bias-judge

Position bias (judge)

The systematic tendency of an LLM-as-judge to favour responses in a particular position (first, second, or last) when comparing candidates — independent of content quality.

What this means in practice

Documented in Zheng et al. 2023; mitigations include randomised ordering, dual-sided judging, and calibration against human raters.

Synonyms

judge position bias , LLM-judge position bias

See also

  • LLM-as-judge — An evaluation technique using a large language model to score outputs from another LLM on quality dimensions — helpfulness, correctness, safety — scaling evaluation beyond human-rater capacity.
  • Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
  • Golden dataset — A versioned, labeled, license-cleared evaluation dataset used as the benchmark reference for an AI feature.