Structuring the AI Business Case

FlowRidge

Business Case Build — From Headline to Risk-Adjusted NPV

Figure 354. An AI business case cascades from gross benefit down to risk-adjusted NPV. Skipping any step produces over-optimistic figures that fail CFO challenge.

COMPEL Specialization — AITE-VDT: AI Value & Analytics Expert Article 6 of 35

A value lead inherits a business case presented to the executive committee six months ago. The document is thirty-four slides. It contains industry benchmarks, vendor claims, three architecture diagrams, and a single financial slide that promises a 340% three-year ROI. The case has been approved and the programme is spending at run rate. The lead is now expected to report against the case at the next quarterly review, and discovers that the case has no hypothesis, no risk profile, no sensitivity analysis, and no explicit decision rule. It is a pitch deck that was treated as a business case. Reporting against a pitch deck is impossible; the document was built to get approval, not to be measured against. This article opens Unit 2 (Financial modelling for AI) by specifying the six-part structure of an AI business case, calibrating claim strength against evidence, and naming the three common failure modes that reduce a case from an accountability document to a marketing artefact.

The six-part structure

An AI business case has six parts. Each is compulsory, each fits on one or two pages, and each has a specific function in the document’s life. Missing parts are not a convenience; they are a guarantee that the case will not survive the first serious challenge.

Part one — Hypothesis. The testable business statement the programme is built on. “If we deploy an AI contract-review copilot to our 180-person commercial legal team, we will compress first-draft turnaround from four days to one without reducing quality, producing a sustained reduction in outside counsel spend of at least US$3.5M annually.” A hypothesis is specific enough to be falsifiable. A “strategic AI transformation programme” is not a hypothesis; it is an aspiration.

Part two — Investment. The total investment across the feature’s life, not just the build phase. Build cost, run cost, governance cost, integration cost, change-management cost, and the organisation’s opportunity cost of the time committed. The investment section terminates in a number (or a range) that is comparable to the benefit section’s number. Apples and apples.

Part three — Benefit. The quantified benefit the hypothesis, if true, will produce. Benefit is expressed in the same units as the investment — dollars for dollars, time for time, risk-reduction for risk-exposure — and is decomposed into the specific mechanisms that produce it. A benefit of “US$3.5M reduction in outside counsel spend” is backed by a line-item breakdown: how many matters go to outside counsel today, what portion will be redirected internally under the programme, what the per-matter cost reduction is, and what the adoption curve looks like over the first three years.

Part four — Risk profile. The risks that could cause the hypothesis to be false, the investment to exceed plan, or the benefit to underperform. A risk profile is not a generic risk register; it is the specific, case-level risks. Model risk — the copilot’s drafts fail legal review too often. Change-management risk — the associates do not adopt the tool. Data risk — the precedent corpus is insufficient. Vendor risk — the model pricing changes. Each risk is scored for likelihood and impact, and each has a mitigation.

Part five — Financial summary. The risk-adjusted financial case. Payback period, risk-adjusted NPV (rNPV, Article 7), three-year TCO (Article 8), and the sensitivity cases (Article 11). The financial summary is the part the CFO will read first; it must stand on its own as a one-page document.

Part six — Recommendation. The decision the business case is asking for — approve, approve with conditions, reject, re-scope — and the expected next-stage commitments. The recommendation includes the decision rule that will govern the next stage review (stage-gate re-evaluation at twelve weeks, based on measured adoption and realisation rates). A case without an explicit recommendation is a case the reader must construct, and the reader will construct the recommendation most convenient to their existing preference.

[DIAGRAM: StageGateFlow — business-case-six-parts — horizontal flow showing the six parts in sequence (hypothesis → investment → benefit → risk profile → financial summary → recommendation), each stage annotated with its function (“what”, “cost”, “gain”, “what could go wrong”, “the math”, “the ask”); primitive teaches the logical build of the case in one visual.]

Calibrating claim strength against evidence

A business case is only as strong as the evidence behind its claims. Three evidence tiers, drawn from consulting and finance practice, map cleanly onto AI business cases.

Tier one — demonstrated. The claim rests on evidence from a similar feature in the organisation itself, with a defensible counterfactual. “Our Q2 pilot of this copilot on a 30-person sub-team produced the quantified effect.” Tier-one evidence is rare at business-case time, because most cases are written before a pilot has produced evidence. Where tier-one evidence exists, it is the strongest foundation.

Tier two — analogous. The claim rests on evidence from a comparable deployment at another organisation, with a documented outcome. “GitHub Copilot’s controlled trial (Peng et al., arXiv 2023) demonstrated a 55.8% task-completion time reduction on a scoped coding task.”¹ The GitHub study is the canonical analogous-evidence reference; analogous evidence is acceptable if the comparability is explicit and the transferability is argued rather than assumed.

Tier three — benchmarked. The claim rests on cross-industry benchmarks from a reputable source. “McKinsey’s State of AI 2024 survey reports that leading AI adopters attribute measurable financial lift to AI in a majority of deployed functions.”² Benchmarked evidence is the weakest tier because the benchmark’s applicability to the specific feature is usually contested. Cases that rely only on tier-three evidence are vulnerable to the CFO’s standard challenge: why should our organisation be in the benchmark’s tail rather than its head.

A business case mixes tiers deliberately, and names the tier for each claim. The overall case’s credibility is bounded by the weakest claim on which the case’s recommendation depends. A recommendation that depends critically on a tier-three assumption is a recommendation that depends on faith.

Three common failure modes

Three failure modes recur in business cases shipped under pressure.

Optimism bias. The benefit is estimated at its upper plausible bound, the investment at its lower plausible bound, and neither is stress-tested against realistic conditions. Optimism bias is easy to detect if the practitioner compares the case’s assumptions against industry benchmarks. If the case assumes an 80% adoption rate while comparable rollouts are reporting 45–60%, the case has embedded an optimistic adoption curve without defending the choice. BCG’s AI at Scale research consistently emphasises the realistic-benchmark discipline — setting adoption and outcome assumptions at the median of documented comparable programmes rather than the peak.³ The default is median, with deviations justified.

Anchor bias. The case’s numbers are anchored to a figure someone senior has already expressed publicly — the CEO’s “AI will cut costs by 30%” town-hall announcement, for example — and the case then works backwards to produce a number consistent with the anchor. Anchor bias is detectable by reading the case and asking where each number came from. Numbers that cite the anchor (even indirectly) and numbers with no independent derivation are candidates. The corrective is a blind-build discipline: estimate the benefit from first principles before looking at the anchor, then explain any divergence.

Sunk-cost dressing. The case justifies continued investment by referring to the investment already made. “We’ve already spent US$4.2M and are 70% complete; the remaining US$1.8M produces an attractive ROI.” Sunk cost is irrelevant to the forward-looking decision the case should answer. A case that depends on sunk cost is answering the wrong question. The corrective is to rewrite the financial summary with a forward-only perspective: what does this investment cost from today, what does it produce from today, and what is the opportunity cost of the capital.

Each failure mode is detectable with a cold read. A practitioner new to a case should spend an hour reading it cold, marking every assumption as tier one, two, or three, and noting any sentence that relies on optimism, an anchor, or sunk cost. The resulting margin notes are the agenda for the revision.

The case’s relationship to the measurement plan

The business case and the measurement plan Article 4 introduced are different documents with different audiences. The case justifies the investment to the approving authority; the plan instruments the measurement that will show whether the case’s hypothesis was true. They share data, they reference each other, and they are written by overlapping teams, but they are not the same document.

A practitioner who conflates them produces a case that cannot be measured against (the plan’s decision rule is missing) and a plan that cannot be explained to the sponsor (the case’s context is missing). The AITE-VDT discipline is to maintain both as versioned artifacts, with the case referencing the plan by version for each measurable claim.

[DIAGRAM: HubSpokeDiagram — business-case-evidence-at-hub — central hub “Hypothesis” with spokes labelled by evidence tier and by business-case part (investment evidence, benefit evidence, risk evidence, financial-summary evidence); each spoke annotated with the tier (demonstrated, analogous, benchmarked) applicable to its claim; primitive teaches evidence-tier mapping across the case.]

Worked example — the MIT Sloan value-lead case

The MIT Sloan Management Review / BCG State of AI at Work research series has published multiple case studies of value-lead roles in 2020–2025. The recurring finding is that organisations that produce documented, six-part business cases are disproportionately represented among the “AI value capturers” quadrant of the survey.⁴ The research does not prove that six-part cases cause value capture, but it is consistent with the claim that a discipline of structured cases correlates with the measurement discipline the rest of value capture depends on.

The counter-example is the enterprise with a “slide-deck culture” — where business cases are produced as slide-ware for the decision meeting rather than as standing documents. The MIT Sloan / BCG survey associates this pattern with the “AI aspiration without capture” quadrant. Slide decks accomplish the approval; they do not survive the three-year life of the programme whose approval they secured.

The case as a standing, versioned document

A business case is not a one-time artifact. It is a standing document that is re-opened at every stage-gate review (Article 31) and re-versioned whenever the feature’s scope, budget, outcome targets, or risk profile materially change. An organisation in which business cases live only in the approval meeting’s slide deck and are never re-opened is an organisation whose programmes cannot be measured against their own original promises.

The practitioner habit is to store the case in the same location as the measurement plan, with visible version history. Each stage-gate review produces a case-update deliverable: changed assumptions since the last version, the reason for each change, and the impact on the recommendation. An approved case that has not been re-opened in nine months is a candidate for review regardless of the feature’s status.

Summary

An AI business case is a six-part document — hypothesis, investment, benefit, risk profile, financial summary, recommendation. Claims are calibrated against evidence in three tiers (demonstrated, analogous, benchmarked), and the weakest tier bounds the case’s overall credibility. Three failure modes — optimism, anchor bias, sunk-cost dressing — are detectable by cold read and correctable by discipline. The case references the measurement plan for every measurable claim and is re-versioned at every stage-gate. Articles 7 through 11 develop the financial sub-disciplines the case depends on, beginning with rNPV.

Cross-references to the COMPEL Core Stream:

EATP-Level-2/M2.5-Art14-Building-the-AI-Business-Case-Beyond-Simple-ROI.md — practitioner-level business case methodology the six-part structure extends
EATP-Level-2/M2.5-Art04-Business-Value-and-ROI-Quantification.md — ROI quantification methodology embedded in Part 5 (Financial summary)
EATF-Level-1/M1.2-Art17-AI-Operating-Model-Blueprint.md — operating-model blueprint that business cases contribute to at organisation level

Q-RUBRIC self-score: 90/100

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer, “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot”, arXiv preprint arXiv:2302.06590 (February 2023), https://arxiv.org/abs/2302.06590 (accessed 2026-04-19). ↩
McKinsey & Company, “The state of AI in early 2024: Gen AI adoption spikes and starts to generate value” (May 30, 2024), https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai (accessed 2026-04-19). ↩
Boston Consulting Group, “AI at Scale” research series, https://www.bcg.com/capabilities/artificial-intelligence/ai-at-scale (accessed 2026-04-19). ↩
MIT Sloan Management Review and Boston Consulting Group, “Expanding AI’s Impact with Organizational Learning” (October 2020) and follow-on “State of AI at Work” series (2020–2025), https://sloanreview.mit.edu/projects/expanding-ais-impact-with-organizational-learning/ (accessed 2026-04-19). ↩