Skip to main content

COMPEL Glossary / datasheet-for-datasets

Datasheet for datasets

A structured dataset documentation artifact covering motivation, composition, collection process, preprocessing, uses, distribution, and maintenance — modeled after electronic-component datasheets.

What this means in practice

Supports informed reuse by downstream AI teams and regulators; widely adopted in industry (Hugging Face Dataset Cards, Google Model Cards dataset sections, Meta Research) since Gebru et al. (2018).

Synonyms

datasheet , dataset datasheet , Gebru datasheet

See also

  • Provenance — The record of origin and custody for a data asset — who collected it, from whom, under what legal basis, and through which hands it passed — required for auditability of high-risk AI under EU AI Act Article 10.
  • Data contract — A versioned, testable specification of a data product's schema, semantics, quality expectations, SLA, and change-management policy — published by the producer, consumable by downstream AI workloads.
  • Readiness scorecard — A structured, dimension-by-dimension artifact summarizing evidence, scores, remediation priorities, and owner assignments for a use-case-scoped data-readiness assessment.

Related articles in the Body of Knowledge