Lab 1 — Dataset Profiling and Quality Scoring

FlowRidge

COMPEL Specialization — AITM-DR: AI Data Readiness Associate Lab 1 of 2

Lab overview

This lab places the learner in the role of a readiness practitioner at a mid-sized property-and-casualty insurance carrier. The carrier is scoping a customer-churn prediction use case. The scoping team has flagged a dataset — 18 months of policyholder interactions, policy terms, premium history, and churn indicators — as the candidate training source. The practitioner’s task is to profile the dataset and produce the data quality section of the readiness scorecard.

The lab is run against a provided synthetic dataset seeded to reproduce realistic defects. No real personal data is present; every record is generated. The dataset is deliberately imperfect and contains at least one significant defect per quality dimension the learner must detect.

Time: 90 minutes. Prerequisites: Articles 1, 2, and 3 completed. Output: A completed quality scorecard section (section 3 of the scorecard template) with evidence references.

Scenario

The carrier. A regional US-based property-and-casualty insurer with 380,000 active policies and a two-state footprint. Data team is eight people. Governance maturity is intermediate — some contracts exist, a catalog exists, lineage is partial.

The use case. Predict which policyholders are likely to non-renew at the next renewal cycle, for the purpose of targeted retention outreach. Risk tier is classified as medium (internal policy) — the model is advisory, human reviewers approve outreach, no consequential automated decisions.

The dataset. policyholder_interactions_2024_v7.parquet. 380,000 rows, 47 columns, covering 18 months of policyholder data:

Identity: policy_id, household_id, primary_name_hash, state, zip3
Policy: policy_type, premium_annual, deductible, coverage_amount, policy_start, policy_expire, last_renewal_date
Interactions: claim_count_12m, claim_severity_mean, support_contact_count_12m, support_channel_primary, nps_last_score, nps_last_date
Billing: payment_method, late_payments_12m, autopay_flag
Target: churned_at_renewal (0/1 binary label)

The scoping brief (already signed). The model will be retrained quarterly on a rolling 18-month window, served behind an internal API, and the retention team will act on the top 10% of predicted churners per month.

Task 1 — Automated profile (15 minutes)

Using whatever tooling the learner is comfortable with (pandas + numpy + a profiling library such as ydata-profiling, a direct SQL profile against a database, or a vendor profiling tool), produce an automated profile of the dataset. The profile should cover:

Row count and column count; verify against the supplier’s claim (380,000 × 47)
Per-column null rate
Per-column unique-value count
Per-column type and sample values
Univariate distributions for numeric columns (mean, std, min, max, quartiles)
Top-10 value distributions for categorical columns
Pairwise correlations for the numeric subset

Record the profile output. Note any column where the profile itself suggests a defect (for example, a column declared numeric with non-numeric entries; a high null rate on a field the scoping brief implied would always be present).

Task 2 — Dimension-by-dimension scoring (45 minutes)

Score the dataset against each of the ten AI-extended data quality dimensions from Article 2. For each dimension:

Identify the specific check that applies to this use case.
Run the check against the dataset.
Record the result.
Score the dimension on a 1–5 scale (1 = severe defect, 5 = exemplary).
State the threshold the score was measured against and justify it.
File the evidence in the evidence register.

The ten dimensions, with hint on the primary check for this lab:

Accuracy — sample-and-compare 200 rows against source system. Can you reconcile?
Completeness — null-rate analysis per critical column; is the null pattern MCAR?
Consistency — cross-column checks (e.g., does policy_expire always follow policy_start? Does autopay_flag match payment-method behavior?)
Timeliness — inspect nps_last_date distribution; does it reflect the declared 18-month window?
Validity — schema validation; check values against declared ranges; inspect format compliance for date-typed columns.
Uniqueness — deduplication on policy_id and on household_id; any fragmentation?
Representativeness — compare state mix and policy-type mix against the carrier’s published book-of-business proportions.
Freshness versus training cutoff — what is the effective cutoff of the dataset? Does it leave enough lag for the target label to be observed?
Labeling agreement — how was churned_at_renewal assigned? Is this a deterministic system-of-record label (high agreement implicit) or a business-rule label (agreement depends on rule stability)? What is the rule?
Distributional stability — compute PSI on key features between the first 9 months and the second 9 months. Any features above PSI 0.25?

The lab’s synthetic dataset contains seeded defects the learner should discover. Representative defects (spoilers — do not read until after attempting the task): a cluster of duplicates in policy_id from a bad join; a nps_last_date distribution with a 6-month gap reflecting a survey outage; a drift in payment_method distribution mid-window reflecting a billing-system migration; a representativeness gap where one state is overrepresented in the churner class versus the book of business; a labeling-rule drift where the churn definition was tightened six months into the window.

Task 3 — Findings and remediation draft (20 minutes)

Based on the ten dimension scores, draft:

The dimension scorecard — a compact table with dimension, score, threshold, status, evidence reference, owner.
The gaps and risks subsection — for each dimension scored amber or red, a one-paragraph description of the gap, its consequence for the use case, and a proposed remediation.
The executive summary paragraph — three sentences summarizing the quality posture, the most significant defects, and the recommended determination (fit / conditionally fit / not fit) on the quality dimension alone.

Task 4 — Peer review (10 minutes)

Pair with another learner. Each reviews the other’s scorecard and checks:

Is every claim supported by an evidence reference?
Is every threshold justified against the use case?
Are findings stated in a way that a product owner could act on?
Is the determination defensible if questioned by an auditor?

Exchange one piece of constructive feedback. Revise your scorecard.

Expected artifacts

On completion, each learner should have:

A saved automated profile output (HTML, JSON, or notebook)
A completed dimension scorecard section (section 3 of the readiness scorecard template, with all ten dimensions scored)
A gaps and risks subsection covering at least three findings
A drafted executive summary paragraph
An evidence register with at least ten entries, each numbered and sourced

Reflection questions

Answer in writing. The responses will be discussed in the next cohort session.

Which dimension was hardest to score with confidence, and why? What additional evidence would have made the score defensible?
What threshold did you set for representativeness? How did you justify it? Would a reviewer from a different internal function have justified a different threshold?
The scoping brief implies a medium risk tier. If the risk tier were reclassified as high, which of your scores would tighten, and which thresholds would you revise?
You found a labeling-rule drift mid-window. What is the implication for model training? What are the remediation options and their trade-offs?
If the retention team insists the use case must proceed on the current dataset despite your findings, what is your recommendation and what escalation path would you follow?

Grading rubric

The lab is graded pass / needs revision, against:

Evidence coverage — every dimension has an evidence entry. (weight: 25%)
Score defensibility — scores are justified against the use case. (weight: 25%)
Finding actionability — findings are stated so a product owner could act on them. (weight: 20%)
Threshold justification — thresholds are named and defended. (weight: 15%)
Remediation feasibility — remediation proposals are specific and cost-framed. (weight: 15%)

A pass requires meeting at least 80% of the rubric. Below 80%, the lab returns with specific revision notes.

Tooling neutrality

The lab can be completed in any combination of: Python (pandas, numpy, ydata-profiling, great-expectations, soda-core), R (tidyverse, skimr), SQL-only approaches against a provided sample warehouse, or commercial tooling the learner’s organization has licensed. The grading rubric makes no reference to tooling choice; only to evidence coverage and quality.

Connection to subsequent labs and case studies

The output of Lab 1 — a quality section — is one of the inputs to Lab 2, which authors a data contract for a retrieval-augmented generation source, and to the case study, which analyzes a consequential-decision AI program across multiple readiness dimensions. The reflection-question answers will surface again in the Article 11 scorecard-authoring discussion.