Reinforcement Learning from Human Feedback (RLHF)

FlowRidge

COMPEL Glossary / reinforcement-learning-from-human-feedback-rlhf

RLHF is the technique used to align large language model behavior with human preferences and safety requirements.

What this means in practice

In the RLHF process, human evaluators rate model outputs on quality, helpfulness, and safety. These ratings train a reward model that captures human preferences. The language model is then fine-tuned using reinforcement learning to produce outputs that score highly according to the reward model. RLHF is what makes LLMs helpful, harmless, and honest rather than merely predicting likely text. However, RLHF introduces governance challenges: feedback quality and bias (if evaluators have narrow perspectives, the model inherits those biases), reward hacking (the model may optimize for the reward signal rather than genuine quality), and value alignment stability (preferences encoded at one point may become stale as organizational values evolve).

Related Terms

Other glossary terms mentioned in this entry's definition and context.

Cite this article

Author:: FlowRidge Team
Publisher:: FlowRidge
First Published:: 2026
Work:: COMPEL AI Transformation Body of Knowledge

Academic (APA)

FlowRidge Team. (2026). Reinforcement Learning from Human Feedback (RLHF) — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge. Retrieved from https://www.compelframework.org/glossary/reinforcement-learning-from-human-feedback-rlhf

BibTeX

@misc{compel-reinforcement-learning-from-human-feedback-rlhf-2026,
  author = {{FlowRidge Team}},
  title = {Reinforcement Learning from Human Feedback (RLHF) — COMPEL Glossary},
  howpublished = {COMPEL AI Transformation Body of Knowledge},
  publisher = {FlowRidge},
  year = {2026},
  url = {https://www.compelframework.org/glossary/reinforcement-learning-from-human-feedback-rlhf},
  note = {Governed by the COMPEL Framework License Agreement}
}

Plain text

FlowRidge Team. Reinforcement Learning from Human Feedback (RLHF) — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge, 2026. https://www.compelframework.org/glossary/reinforcement-learning-from-human-feedback-rlhf

Need Chicago, IEEE, or MLA formats? See the full COMPEL Citation Guide for every supported format with copy-ready snippets.

This content is part of the COMPEL AI Transformation Body of Knowledge, governed by the COMPEL Framework License Agreement. See /license for terms.