Skip to main content

COMPEL Glossary / training-data-memorization

Training data memorization

Verbatim or near-verbatim reproduction of training data by a model during inference.

What this means in practice

Most severe when training data contains personal data, trade secrets, or copyrighted material; forms the core of OWASP LLM02 sensitive-information-disclosure exposures.

Synonyms

training data extraction , data regurgitation , LLM02 memorization

See also

  • LLM risk surface — The union of six interacting layers — input, model, output, retrieval, tool, and data — where governance controls must be applied on any LLM-based feature.
  • Confabulation — NIST's preferred term for hallucination: an LLM generating fluent output that is unsupported by ground truth.
  • System prompt leakage — Extraction of an LLM feature's hidden system prompt and structural instructions through crafted user input.