The VRR is not the dashboard. Dashboards are the live data surface; VRRs are the curated narrative that gives the dashboard meaning. It is not the business case either — the business case projects; the VRR reports. It is not the control performance report from Article 15, though the two are consumed together by audit committees. The VRR stands alone as the stakeholder-facing narrative of realized AI value.
This article specifies the six sections, walks through the evidence standard that distinguishes audit-grade reporting from marketing, and teaches the reader to produce a VRR that holds up in three situations that test it most: the audit committee who wants causal rigor, the investor call where every claim will be scrutinized, and the regulator who wants traceability to ISO 42001 Clause 9.3 management review outputs.
Why the VRR exists
Organizations that measure AI value without a standing report suffer three predictable failures. First, the narrative drifts — each quarter’s numbers are re-framed to support the story leadership wants to tell, and the counterfactual gets gradually detached from the delivered value. Second, the evidence trail goes missing — by the time an auditor asks how a claimed $47M benefit was computed, the analyst who built the model has moved teams and the spreadsheet lives in a departed colleague’s personal drive. Third, the stakeholder audience fragments — the CFO gets one story, the CEO gets a prettier story, the board gets a different story, and the investor relations team gets the story the IR head wishes were true.
The VRR solves all three failures with one artifact. A single report, produced on a fixed cadence, in a fixed structure, with a fixed evidence standard, consumed by all stakeholders from the same source. The McKinsey State of AI 2024 report documents that organizations capturing above-median AI value share one operational practice: they report AI outcomes to a single standing audience on a predictable cadence.1 The VRR is that practice made concrete.
The report’s cadence is typically quarterly for programs and monthly for individual features in early rollout. Frequency tightens when a feature is in active rollout, decision-pending, or regulatory-reviewed; it loosens once a feature is stable. The cadence sits alongside the COMPEL stage-gate calendar so that the VRR is the evidence base the next stage-gate review consumes.
The six sections
The VRR has exactly six sections. The order is not arbitrary: it is the order in which an executive reader will ask questions if the report is delivered verbally, which is why reports structured this way survive live board Q&A.
Section 1 — Executive summary
The executive summary is one page. It states the feature name, the reporting period, the claimed realized value, the counterfactual method, the risk posture, and the requested decision if one exists. The best executive summaries contain one number, one headline, and one decision ask. Reports that hedge the number, bury the headline, or dodge the decision ask consistently fail board-grade review.
The summary’s most common failure is over-attribution. A feature delivered $14M in realized value against a counterfactual that shows $9M would have happened anyway; the incremental claim is $5M, not $14M. The executive summary must state $5M and show the counterfactual math in the next section, not assume the reader has time to find it. CFOs who have been burned by one over-attributed claim will never again trust the analyst who made it.
Section 2 — KPI tree and realized values
This section presents the three-level KPI tree from Article 12, with the realized value of each leaf metric for the reporting period, the prior-period value, and the target value. The tree is the causal spine: outcomes at the root, drivers in the middle ring, metrics at the leaves. A KPI tree with a realized value against each leaf is an auditable graph — every claim at the root is traceable to the evidence at the leaves.
The evidence standard for this section is reproducibility. An auditor must be able to take the tree, take the data source definitions from the measurement plan, and re-compute every metric from raw source data. If re-computation is not possible because a source system does not retain the underlying records, that gap must be disclosed in the section’s evidence notes. Non-disclosure of a reproducibility gap is the single most common reason a VRR fails internal audit.
Section 3 — Counterfactual narrative
The counterfactual narrative is the heart of the VRR. It states what would have happened without the AI, by what method, with what confidence. The counterfactual methods from Articles 18–23 — A/B, difference-in-differences, regression discontinuity, synthetic control, propensity-score matching — each produce a counterfactual estimate. The narrative section names the method, shows the estimate, and reports uncertainty bands at the 10th, 50th, and 90th percentile.
Where the counterfactual is weak — for instance, an enterprise-wide copilot rollout with no untreated geography, so only a synthetic control is feasible — the weakness is disclosed. A weak counterfactual with honest disclosure is more credible than a strong counterfactual presented without uncertainty bands. The Dutch Toeslagenaffaire inquiry was particularly pointed about this: the most damning part of the report was not that the algorithm caused harm, but that no counterfactual had ever been constructed to test whether the algorithm was improving outcomes at all.2
Section 4 — Financial summary
The financial summary presents realized value, total cost of ownership (TCO) from Article 8, risk-adjusted net present value (rNPV) from Article 7, and the payback-to-date trajectory. Each line is footnoted to its source — the cost line to the FinOps cost export, the value line to the counterfactual analysis, the rNPV to the sensitivity model. Footnote traceability is what separates audit-grade financial summaries from management summaries.
The Klarna 2024 investor disclosure is a useful example of how and how not to present this section. The company disclosed unit-economics impact of AI customer-service automation; the press treated the disclosure as a clean win; independent analysts pointed out that the disclosure did not clearly separate cost savings from migration impacts.3 A well-structured financial summary anticipates that challenge: it reports realized cost reduction, disclosed migration effects, and net incremental savings separately.
Section 5 — Risk flags
The risk flags section lists every risk that has materialized, every risk that is trending toward materialization, and every risk that has been resolved since the last reporting period. The section maps to the COMPEL risk taxonomy and to the EU AI Act classification where applicable. Three categories are standard: value risks (drift, adoption erosion, attribution challenge), operational risks (cost overrun, evaluation-harness gap, incident), and governance risks (regulatory change, control gap, audit finding).
Risk flags are not hedged language. “May” and “could” and “potentially” are not risk flag prose; they are marketing prose. A properly flagged risk names the probability band, the impact band, the owner, and the mitigation status. The NIST AI RMF MEASURE function provides the vocabulary for risk measurement; the VRR risk section is the place where that vocabulary meets executive decision-making.4
Section 6 — Recommendation
The recommendation is either “continue,” “change,” or “stop.” A recommendation that is “continue with minor adjustments” is still a continue; a recommendation that is “continue but we have concerns” is still a continue but with a risk flag elevated in section 5. The recommendation is binary in direction; its amplitude appears in the supporting evidence, not in the recommendation language.
The recommendation section ends with a decision log entry: who decided, what, on what date, with what dissent recorded. Decision-log traceability is an ISO 42001 Clause 9.3 management-review requirement.5 Omission is a control gap that auditors will find and name.
Evidence standards
VRR evidence must clear three bars. The first is reproducibility: an independent analyst must be able to re-compute every number from the stated source data. The second is traceability: every claim must cite its source document, its data-source definition, and its counterfactual method. The third is disclosability: gaps, weaknesses, and dissents are disclosed in the report, not hidden in appendices only the author has read.
The US Navy “Task Force Hopper” AI program is a public-sector example of the evidence standard in practice. GAO reporting has noted that the program publishes structured measurement plans and subsequently reports realized value against those plans on a predictable cadence.6 The cadence discipline, not any particular measurement result, is what the GAO’s AI Accountability Framework commends.
Delivery mechanics
A VRR is delivered at four audience levels with the same underlying evidence. The executive summary is the CEO-and-CFO version. The full report is the audit-committee version. The investor version takes the executive summary, removes the operational detail, and adds the corporate-financial context. The regulator version takes the full report and adds the compliance mapping (ISO 42001, EU AI Act, sector rules).
Four audience levels, one evidence base, zero contradictions. When the CFO and the investor relations head read different numbers, they are reading inconsistent reports, not different reports. Inconsistent reports are the single greatest reputational risk in AI value communication; the VRR structure eliminates them.
Cross-reference to Core Stream
The VRR builds on Core Stream foundations at:
EATP-Level-2/M2.5-Art09-Value-Realization-Reporting-and-Communication.md#reporting-structure— practitioner-level VRR structure.EATF-Level-1/M1.2-Art24-Control-Performance-Report.md#section-structure— parallel artifact for control-effectiveness reporting; VRRs and CPRs are co-consumed by audit committees.EATF-Level-1/M1.2-Art05-Evaluate-Measuring-Transformation-Progress.md— Evaluate stage methodology that governs VRR cadence.
Self-check
- A VRR executive summary claims $14M in realized value. Section 3 shows a counterfactual of $9M. What is the correct claim, and what section 1 change is required?
- An auditor cannot re-compute a metric in the KPI tree because the source system does not retain 18-month-old records. What must the VRR disclose, and under which evidence standard?
- A risk flag reads “Customer adoption could potentially slow if competing tools emerge.” Rewrite this to meet the risk-flag evidence standard.
- An investor-version VRR differs from the audit-committee version in the realized-value number. What has gone wrong, and which structural rule has been violated?
Further reading
- ISO/IEC 42001:2023 Clause 9.3 — Management review outputs.
- NIST AI RMF MEASURE function — MEASURE 2.1 through MEASURE 2.13.
- GAO AI Accountability Framework, GAO-21-519SP — Performance monitoring.
Footnotes
-
McKinsey & Company, The State of AI in 2024 (2024). https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai ↩
-
Parlementaire ondervragingscommissie Kinderopvangtoeslag, Ongekend Onrecht (Dutch parliamentary inquiry final report, December 2020). https://www.tweedekamer.nl/kamerstukken/detail?id=2020D51917 ↩
-
Klarna Bank AB, Investor Relations AI customer-service disclosure (2024). https://www.klarna.com/international/press/ ↩
-
National Institute of Standards and Technology, AI Risk Management Framework (AI RMF 1.0), NIST AI 100-1 (January 2023), MEASURE function §4. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf ↩
-
International Organization for Standardization, ISO/IEC 42001:2023 — Artificial intelligence management system, Clause 9.3 (2023). https://www.iso.org/standard/81230.html ↩
-
US Government Accountability Office, Artificial Intelligence: An Accountability Framework for Federal Agencies and Other Entities, GAO-21-519SP (June 2021). https://www.gao.gov/products/gao-21-519sp ↩