COMPEL Glossary / data-lake
Data Lake
A data lake is a centralized storage repository that ingests and holds large volumes of raw data in its original format, whether structured, semi-structured, or unstructured, until it is needed for analysis, reporting, or AI model training.
What this means in practice
Data lakes provide the flexibility to store diverse data types cheaply and perform complex analyses that would be difficult in traditional structured databases. For organizations building AI capabilities, data lakes provide the scalable storage needed for the large, diverse datasets that modern machine learning requires. In COMPEL, data lake architecture is assessed as part of the Technology pillar during Calibrate, with the evolution toward data lakehouse architectures (combining lake and warehouse capabilities) discussed in Module 3.3, Article 3 as a converging industry pattern.
Why it matters
Data lakes provide the scalable, flexible storage that modern AI requires for large, diverse datasets spanning structured records, unstructured text, images, and sensor data. Organizations without adequate data lake infrastructure face storage bottlenecks that limit the scope and ambition of their AI initiatives. However, without proper governance, data lakes can become data swamps where quality deteriorates and assets become undiscoverable.
How COMPEL uses it
Data lake architecture is assessed as part of the Technology pillar during Calibrate, with maturity levels reflecting the evolution from basic file storage to governed, discoverable repositories. During Model, the data architecture design may specify lakehouse evolution patterns combining lake flexibility with warehouse governance. Module 3.3, Article 3 covers data lake architecture as a foundational technology decision.
Related Terms
Other glossary terms mentioned in this entry's definition and context.