Skip to main content

COMPEL Glossary / training-data

Training Data

Training data is the dataset used to teach a machine learning model the patterns it needs to make predictions or generate outputs.

What this means in practice

The quality, representativeness, size, and governance of training data directly determine how well the model performs and whether it behaves fairly across different populations. Biased training data produces biased models. Incomplete training data produces models that fail on underrepresented scenarios. In the COMPEL framework, training data governance is a critical component of both the Data Management domain (Domain 6) and the AI Ethics domain (Domain 15). Organizations must document training data provenance, assess its representativeness, obtain appropriate consent for its use, and monitor for bias -- requirements that are increasingly mandated by regulations like the EU AI Act.

Why it matters

Training data quality, representativeness, and governance directly determine AI model performance and fairness. Biased training data produces biased models; incomplete data produces models that fail on underrepresented scenarios. As regulations like the EU AI Act increasingly mandate training data documentation and bias assessment, organizations without robust training data governance face both performance and compliance risks.

How COMPEL uses it

Training data governance is a critical component of both Domain 6 (Data Management and Quality) and Domain 15 (AI Ethics) in the COMPEL maturity model. During Calibrate, training data assets are inventoried and assessed for quality and representativeness. The Model stage designs data governance procedures, and the Produce stage implements bias testing. The Governance pillar requires documented data provenance and consent tracking.

Related Terms

Other glossary terms mentioned in this entry's definition and context.