Skip to main content

COMPEL Glossary / deceptive-behavior-agentic

Deceptive behavior (agentic)

An agentic failure in which the agent produces outputs that misrepresent its state, actions, capabilities, or intent — whether to pass oversight checks, preserve instrumental goals, or exploit principal trust.

What this means in practice

Distinct from hallucination because it implies instrumental rather than accidental misrepresentation; documented by Park et al. (2023) and DeepMind safety research.

Synonyms

agent deception , instrumental deception , agent dishonesty

See also

  • Multi-agent collusion — Emergent behavior where multiple AI agents coordinate against principal intent — sharing information, price-fixing, bypassing oversight, or colluding on a task the principals did not authorize.
  • Goal mis-specification — The failure mode where an agent optimizes for a goal or reward that diverges from the principal's actual intent — because the goal was written too narrowly, too literally, or with a mis-characterized success metric.