Stage-Gate Value Reviews in COMPEL

FlowRidge

The AI value practitioner is not the gate owner — that role sits with the stage sponsor or the program lead — but the practitioner produces the value evidence that the gate review consumes. A well-prepared value-evidence pack makes the gate decision tractable; a poorly-prepared pack degrades the decision to ceremonial approval or evidence-light advancement.

This article teaches the value-evidence requirements at each of the six gates, the three common failure modes, and the practitioner’s preparation workflow.

The six gates

Each gate asks a different question, requires different evidence, and supports a different decision.

Gate 1 — Calibrate exit (baseline established)

Question: Has the feature’s baseline been established with enough fidelity to proceed to Organize?

Evidence required: Baseline business case (Article 6) with initial rNPV (Article 7); measurement plan (Article 4) covering proposed metrics, counterfactual design, and decision rule; risk classification against EU AI Act and relevant sector rules; stakeholder alignment on success criteria.

Decision options: Proceed, hold for baseline remediation, or do-not-proceed.

Common gate deficiencies: Measurement plan missing the counterfactual section; business case assuming perfect adoption; risk classification deferred to a later stage.

Gate 2 — Organize exit (team and plan ready)

Question: Are the people, data, and infrastructure ready to build?

Evidence required: Team charter with named roles; data readiness assessment (from AITM-DR analyst); initial compute budget (Article 29); stakeholder communications plan.

Decision options: Proceed, hold for readiness remediation, or re-scope.

Common gate deficiencies: Data readiness scored optimistically; compute budget not set; key roles unfilled.

Gate 3 — Model exit (model development complete)

Question: Does the built artifact meet the capability thresholds specified in the measurement plan?

Evidence required: Capability-evaluation results against pre-registered test set; fairness evaluation across protected classes; initial value-driver validation (which KPI tree drivers does the model actually move?); red-team results where applicable; governance artifacts (model card, data sheet).

Decision options: Proceed to pilot, hold for model remediation, or abandon.

Common gate deficiencies: Fairness evaluation omitted or shallow; value-driver validation based on offline data only with no user-interaction evidence; model card incomplete.

Gate 4 — Produce exit (pilot complete, ready for rollout)

Question: Did the pilot validate the business case, and is the feature ready for broader rollout?

Evidence required: Counterfactual analysis from pilot (whichever of Articles 18–23 applies) with point estimate and uncertainty band; pilot-scale realized value against pilot-scale business-case projection; operational-readiness evidence (incident frequency, mean-time-to-resolve); updated TCO (Article 8) and updated rNPV based on pilot data.

Decision options: Proceed to rollout, extend pilot, scale down to experimental, or retire.

Common gate deficiencies: Counterfactual weak or absent; pilot run too briefly to establish realized value; rNPV not updated with pilot learnings.

Gate 5 — Evaluate (ongoing value confirmation)

Question: Is the feature sustaining its realized value and holding within risk tolerance?

Evidence required: Latest VRR (Article 16); portfolio scorecard position (Article 30); drift signals (Article 25) and their correlation to realized value; cost against budget (Article 29); open risks and mitigation status.

Decision options: Continue, modify (feature redesign or scope change), or retire.

Common gate deficiencies: Drift signals not correlated to realized value, making it impossible to know whether drift is value-eroding; cost reported without value context; risks boilerplate rather than specific.

Gate 6 — Learn (post-retirement learning capture)

Question: What did this feature teach the organization, and how is that learning transferred?

Evidence required: Final value summary (total realized value, total investment, final ratio); root-cause analysis of significant variances from business case; documented learnings for future features; contribution to organizational knowledge repository.

Decision options: Learning captured (gate passes) or learning incomplete (additional documentation required).

Common gate deficiencies: Learning document written only by the feature lead without cross-team input; root-cause analysis attributing variance to external factors rather than internal decisions; no structured contribution to the knowledge repository.

The three failure modes

Failure 1 — Ceremonial approval

The gate review happens; slides are shown; heads nod; the gate is stamped; no substantive decision is made. Ceremonial approval is common where the gate-review audience includes leaders who did not prepare and whose attention is elsewhere. The symptom: gate-review minutes record “approved” but not “approved because…” alongside the specific evidence that justified approval.

The fix is structural. Gate reviews should include a pre-read so time is not spent presenting information; the meeting itself should focus on questions and decision. Gate-review minutes should record the evidence considered and the reasoning, not just the outcome. Governance reviewers (internal audit, compliance) should sample gate-review minutes periodically to confirm substance.

Failure 2 — Evidence-light advancement

A feature advances through a gate with inadequate evidence because the team is under time pressure, because the gate’s evidence requirements are informally interpreted, or because the reviewer does not read the evidence closely. The feature’s subsequent struggles then trace back to a gate that approved advancement without confirming readiness.

The fix is a stricter evidence rubric. Each gate’s evidence requirements are documented in the AI operating model (see cross-reference to M1.2-Art17). The gate reviewer applies the rubric; incomplete evidence is flagged as an explicit hold, not a minor concern that can be addressed post-gate.

Failure 3 — Anchor-to-original-case

Gate 5 (Evaluate) reviews compare current realized value against the original business case from gate 1 (Calibrate). The business case was a projection made under Year-0 assumptions; sustaining the comparison without updating the case creates two pathologies. Either the case is pursued past its obsolescence (the world changed; the feature is delivering value the original case cannot see), or the case becomes a scoring rubric that everyone accepts is detached from reality.

The fix is the updated-case discipline. At each stage-gate beyond Calibrate, the business case is updated with learnings from the previous stage. At Evaluate-stage reviews, the comparison is against the most recently updated case, not the original. Updates are tracked so the case evolution is auditable.

The practitioner’s preparation workflow

For any gate the AI value practitioner supports, four preparation steps produce a gate-ready evidence pack.

Step 1 — Confirm the gate question

Before preparing evidence, confirm which gate’s question is being answered. Gate questions differ materially; evidence prepared for Gate 3 (capability) does not answer Gate 4 (pilot validated the business case).

Step 2 — Audit the evidence inventory

For the gate question, list the evidence required and locate each piece. Missing evidence is flagged immediately — better to surface the gap before the gate than to discover it during the review.

Step 3 — Draft the narrative

The evidence pack carries a short narrative (one to three pages) that walks the reviewer through the evidence, surfaces the key decision question, and states the recommended decision. The narrative is not a sales pitch — it anticipates the reviewer’s likely concerns and addresses them.

Step 4 — Dry-run the review

A dry-run with a colleague playing the reviewer role surfaces questions the analyst should have anticipated. Adjustments are made; the pack enters the review one iteration stronger.

The preparation discipline is what separates gate reviews that advance features defensibly from gate reviews that produce ceremonial rubber-stamps or evidence-light approvals that blow up in a later stage.

Cross-reference to Core Stream

EATF-Level-1/M1.2-Art07-Stage-Gate-Decision-Framework.md — canonical COMPEL stage-gate framework.
EATF-Level-1/M1.2-Art17-AI-Operating-Model-Blueprint.md — operating-model location of gate-evidence rubrics.

Self-check

A Gate 3 review approves model advancement; fairness evaluation was omitted because “pilot will catch any issues.” Which failure mode is this?
A Gate 5 review compares current realized value to a 2024 business case; the market shifted significantly in 2025. What failure mode is operating, and what is the fix?
A Gate 4 pilot produced 6 weeks of data; the counterfactual method required 12 weeks for power. What is the correct gate decision?
A Gate 6 learning document is written by one person in an afternoon. What governance discipline is missing?