COMPEL Specialization — AITE-VDT: AI Value & Analytics Expert Lab 1 of 5
Lab objective
Produce a complete, defensible measurement plan for the scenario described below. The plan must include all eleven sections specified in Article 4, align to ISO 42001 Clause 9.1 and NIST AI RMF MEASURE 1.1, and be ready for sponsor sign-off.
Duration: 90 minutes. Deliverable: A completed measurement plan document (Word, Google Docs, or Markdown) of roughly three to five pages. Linked articles: 4 (measurement plan), 5 (leading/lagging indicators).
Scenario
You are the AI value lead for a mid-sized business-services company. The product team is about to ship “BillExplain,” a generative-AI feature that reads a customer’s invoice and produces a plain-language explanation in response to customer chat requests about their charges. Customer-service representatives use BillExplain’s output as a draft, edit as needed, and send to the customer.
The proposed business case claims: (1) reduced handle time per invoice inquiry from an average of 6 minutes to 3.5 minutes; (2) improved first-contact resolution rate from 72% to 85%; (3) a consequent reduction of $3.2M in annualized operations cost.
The feature launches in three weeks to 120 representatives in the North American contact center, with a gradual rollout to the European centers over the following two quarters. No formal measurement plan exists; your CFO has requested one before the rollout proceeds.
What to produce
Draft all eleven sections of the measurement plan. Target approximately one page in total for concise sections and up to a half page each for the two or three sections that require detail.
-
Hypothesis. State the primary business hypothesis in falsifiable terms. Include the direction, magnitude, and population to which the claim applies.
-
Primary metric. Name one primary metric (not multiple). Define its computation, its source system, its time window, and its aggregation rule.
-
Secondary metrics. Name three to five secondary metrics that provide supporting evidence but are not the basis for go/no-go decisions. Describe the purpose of each.
-
Data sources. List every source system contributing to the primary and secondary metrics. For each source, state the owner, the refresh cadence, and any known data-quality limitations.
-
Collection cadence. Specify how frequently each metric is collected and aggregated. Address both the measurement cadence and the reporting cadence.
-
Analysis method. Describe how the causal effect of BillExplain will be estimated. Because all 120 reps will have access day one in the North American center, consider which of the six designs from Article 18 apply. Justify your choice and disclose its limitations.
-
Decision rule. Specify the thresholds on the primary metric that trigger continue, modify, or retire decisions. Make thresholds numeric and defendable.
-
Pre-registration. Record the hypothesis, primary metric, decision rule, and analysis method in a pre-registration record. Specify where this record is stored and how changes are authorized.
-
Review owners. Name (by role) the owners of the measurement plan, the weekly operational review, the monthly value review, and the quarterly stage-gate review.
-
Risk flags. List the three most significant measurement risks you foresee (e.g., contamination, adoption shortfall, drift in upstream invoicing data) and describe mitigation for each.
-
Escalation path. Describe what triggers escalation to the CFO or steering committee, the timeline for escalation, and the supporting evidence required.
Guidance
- Eligibility and population. The 120 representatives are a cluster; consider randomization at the team level rather than individual. Geography rollout provides the staged-timing variation DiD exploits.
- Counterfactual thought. Pre/post comparisons alone are weak because the training period for representatives, the holiday-season volume pattern, and the Europe launch all interact. A DiD across North America (treated earlier) and Europe (treated later) is likely the strongest feasible design.
- Indicator discipline. Handle time is a lagging indicator of rep behaviour; suggestion-acceptance rate and edit-distance on the AI output are leading indicators of realized value.
- Honesty about limits. Adoption is voluntary above minimum use thresholds; this is a potential confounder that will surface in the counterfactual. Disclose it.
Evaluation rubric
Your draft will be scored on the following dimensions.
| Dimension | What to demonstrate | Weight |
|---|---|---|
| Completeness | All eleven sections present and non-trivial | 20% |
| Hypothesis precision | Falsifiable, directional, magnitude-specified | 10% |
| Primary-metric discipline | One metric, well-defined | 10% |
| Counterfactual choice | Design selected and defended against Article 18’s six-question tree | 15% |
| Decision-rule specificity | Numeric thresholds, defensible | 10% |
| Risk-flag substance | Specific risks, not boilerplate | 10% |
| Alignment to ISO 42001 Clause 9.1 | Explicit mention of clauses addressed | 10% |
| Readability for CFO audience | Can be read in ten minutes, supports decision | 15% |
A passing draft scores 70% or above. Drafts scoring below 70% are returned with feedback and re-submitted.
Reflection questions
After completing the draft, answer the following in writing (approximately 150 words per question).
- Which of the eleven sections did you find hardest to complete, and why?
- Your counterfactual design has at least one known limitation. State it honestly and describe how the VRR would disclose it.
- The business case claims a $3.2M annualized saving. Assuming your measurement plan reveals a 30% shortfall, how would you communicate the shortfall to the CFO?
Linked artifacts and further reading
- Article 4 — The measurement plan artifact.
- Article 5 — Leading and lagging indicators.
- Article 18 — Choosing between experimental and observational designs.
- ISO/IEC 42001:2023 Clause 9.1.
- NIST AI RMF 1.0, MEASURE 1.1 subcategory.
Submission
Submit as Word, Google Doc, or Markdown. Reviewer will provide written feedback within one week; drafts may iterate until the rubric passes.