Skip to main content

COMPEL Glossary / data-lineage

Data Lineage

Data lineage is the documented, traceable history of a piece of data as it moves through an organization's systems, recording where it originated, how it was collected, what transformations were applied, where it was stored, who accessed it, and how it was ultimately used in AI models or business processes.

What this means in practice

For AI systems, data lineage is essential for debugging model issues (tracing unexpected predictions back to specific data sources), regulatory compliance (demonstrating that data was collected and used lawfully), and governance (ensuring training data meets quality and consent requirements). In COMPEL, data lineage capability is assessed during Calibrate under both the Technology and Governance pillars, and lineage infrastructure is designed during Model as part of the data architecture specified in Module 3.3.

Why it matters

When an AI model produces an unexpected prediction, data lineage enables teams to trace the problem back to its source, whether that is a corrupted data feed, an upstream transformation error, or a training data quality issue. Without lineage, debugging AI systems becomes guesswork. Regulators increasingly require demonstrable data provenance as evidence that AI decisions are based on lawful, properly handled information.

How COMPEL uses it

Data lineage capability is assessed during Calibrate under both the Technology pillar (infrastructure for tracking) and Governance pillar (policies requiring tracking). During Model, lineage infrastructure is designed as part of the data architecture specified in Module 3.3. The Produce stage implements lineage tooling, and the Evaluate stage uses lineage records as audit evidence to verify governance compliance.

Related Terms

Other glossary terms mentioned in this entry's definition and context.