Total Cost of Ownership for AI

FlowRidge

COMPEL Specialization — AITE-VDT: AI Value & Analytics Expert Article 8 of 35

A value lead reviews the financial performance of an AI customer-service copilot six months after launch. The business case projected US$1.1M of annual run cost. The actual run cost, extrapolated from the first six months, is pacing at US$2.7M. The gap is not a single failure; it is the compound of five cost components the original business case either under-estimated or omitted entirely. The build cost was itself 30% over — a predictable pattern for first-of-kind features. The run cost, driven by token volume exceeding projections, is 220% of plan. The refresh cost, for the quarterly re-training the initial case had not scoped, is an unexpected US$280K annualised. The governance cost — evaluations, audits, control reporting — was invisible to the original case. The retirement reserve, which should have been accrued from day one for the eventual sunset, was not reserved. The business case told a single-year story when the real financial exposure spanned five cost categories over the feature’s life. This article teaches the practitioner to decompose AI Total Cost of Ownership, to account for the hidden components, and to project three-year TCO with uncertainty bands.

The five components of AI TCO

AI TCO is a compound of five components. Each is independently variable, each has a characteristic cost driver, and the ratio among them changes across the feature’s life in ways that the business case must anticipate.

Build cost. The one-time cost to develop the feature to production readiness — design, development, data engineering, model training, integration, testing, user acceptance. Build cost is the component most business cases estimate well, because it resembles classical software-project estimation.

Run cost. The ongoing cost to operate the feature at scale — compute, storage, tokens, API fees, platform licences, data-pipeline operation, human-in-the-loop labour. For generative-AI features run cost now typically exceeds cumulative build cost within 12–24 months of launch, a pattern the FinOps Foundation’s 2024 M1.3FinOps for AI technical paper documents across its member survey.¹ The paper describes run cost as the “emerging dominant dimension” of AI TCO and notes that organisations building mature AI programmes are now front-loading run-cost observability in the build phase, not adding it later.

Refresh cost. The periodic cost to re-train, re-fine-tune, re-evaluate, or re-integrate the feature as data drifts, environment changes, or capability improvements become available. Refresh cost is continuous for most AI features and lumpy for a few; both patterns require explicit budget. A feature whose refresh cost is zero has probably stopped being refreshed and is drifting toward irrelevance — Article 25’s drift-detection discipline is the corrective.

Governance cost. The ongoing cost of control operation — evaluations, audits, control-performance reporting, ISO 42001 management reviews, regulatory submissions, incident response. Governance cost is the component most business cases omit or badly under-estimate. A mature AI programme running under ISO/IEC 42001:2023 will spend 8–15% of its combined build-plus-run cost on governance, a range documented in the FinOps for AI paper and consistent with audit-firm reporting on AI management-system costs.¹

Retire cost. The eventual cost to decommission the feature — model unloading, data-retention discharge, regulatory deletion obligations, user migration, documentation archival. Retire cost is typically reserved at 3–7% of cumulative run cost, accrued across the feature’s life so the eventual sunset has a pre-funded budget rather than a discretionary one. Article 32 treats the sunset case in depth.

[DIAGRAM: StageGateFlow — five-component-tco-lifecycle — horizontal lifecycle flow (year 0 build → years 1–N run + refresh + governance → sunset retire); each stage annotated with its dominant cost driver; cost intensity shown as a bar-height for each year; primitive teaches the TCO’s temporal shape.]

Why run cost dominates the GenAI portfolio

Generative-AI features have a run-cost structure that is genuinely different from classical software. The cost per user transaction is not negligible; it is meaningful, and it scales with usage in ways that compress margin if not actively managed. A customer-service copilot generating 5,000 tokens per conversation across 200,000 monthly conversations at US$15 per million tokens produces a monthly token cost of US$15,000 — which is small only in the sense that scaling the programme by 10x produces US$150,000 per month. Article 10 treats token economics in detail; the Article 8 point is that run-cost is a first-class TCO component for generative features in a way it was not for classical predictive ML.

The McKinsey 2024 State of AI report specifically documents this pattern, noting that a majority of surveyed organisations with generative-AI deployments reported run-cost exceeding build-cost by the end of the first year of operation.² The report does not recommend a specific ratio but emphasises that first-year run-cost forecasting is the leading predictor of whether the full TCO will meet the business case.

The TCO decomposition across cloud providers

AI features frequently deploy across multiple cloud providers. The TCO model must decompose cost across providers to enable cost-aware operational choices. The AITE-VDT neutrality discipline requires worked examples across at least two of the three dominant providers.

On AWS, an AI feature’s TCO components map to SageMaker or Bedrock for build and inference, EC2 and S3 for data pipelines, Amazon CloudWatch and AWS Cost Explorer for observability, and the AWS Cost and Usage Report (CUR) for granular cost attribution. Open-source OpenCost on EKS provides unit-cost decomposition if the feature runs on Kubernetes.

On Azure, the equivalent stack uses Azure Machine Learning or Azure AI Foundry for build and inference, Azure Blob Storage and Data Factory for pipelines, Azure Monitor and Cost Management for observability, and the Azure Cost Analysis API for attribution. Kubecost on AKS provides the same unit-cost decomposition OpenCost offers on AWS.

On GCP, the stack uses Vertex AI for build and inference, Cloud Storage and Dataflow for pipelines, Cloud Monitoring and Billing Reports for observability, and BigQuery’s Billing Export for attribution. OpenCost also runs on GKE for container-level decomposition.

A TCO model that decomposes across at least two of these providers — AWS and Azure for a multi-cloud enterprise, or GCP and on-premise for a hybrid shop — produces the cost transparency the FinOps discipline depends on. Article 27 treats FinOps for AI in detail; the TCO practitioner’s role is to provide the decomposition the FinOps discipline then operates on.

The hidden components business cases miss

Three components are systematically under-estimated or omitted in first-pass business cases. The value lead reviewing an existing case should look explicitly for each.

Data preparation and labelling cost. For supervised-learning features, the cost of labelled training and evaluation data is often larger than the model-training compute cost. For generative features, the cost of curating grounding corpora and evaluation datasets is similarly non-trivial. Data labelling is frequently treated as a one-time build cost; it is usually an ongoing cost as labels need refreshing for new cases, corrections, and drift.

Human-in-the-loop operational cost. Many AI features ship with ongoing human review — content moderation, output verification, exception handling. The human labour is itself part of the TCO; a feature that shifts 60% of the work to the AI but retains a reviewer on every item has a different TCO than one that shifts 90% with sampling-based review.

Compliance and audit cost. Regulated deployments incur ongoing compliance cost — EU AI Act high-risk system audits, ISO 42001 management reviews, sector-specific regulatory submissions. The EU Commission’s impact assessment for the AI Act (published 2021, updated 2023) projects compliance costs in the low-to-mid six-figure range annually per high-risk AI system for large enterprises.³ A mid-sized enterprise with eight high-risk systems will see compliance costs in the seven-figure range, which is a line item the business case must carry.

Three-year projection with uncertainty bands

A TCO projection is a three-year roll-up with uncertainty bands. The AITE-VDT standard format is a table with rows for each component (build, run, refresh, govern, retire) and columns for each year, with each cell showing a point estimate and a range (typically p10 to p90).

The uncertainty bands are informed by three sources. Historical variance in the organisation’s own analogous programmes — a bank with four production AI features has run-cost variance data the value lead can use directly. Industry benchmarks from reputable sources — FinOps Foundation benchmarks, McKinsey patterns, Stanford HAI compute-cost trendlines.⁴ Scenario analysis for specific uncertainties — if token pricing rises 30%, what does run-cost do; if adoption is 40% higher than plan, what does it do.

The resulting projection reports both the point estimate (the expected TCO) and the band (the plausible range). A three-year TCO of “US$14.2M, with plausible range US$11.5M to US$18.4M” is a defensible projection; “US$14.2M” alone is a false precision the CFO will discount.

[DIAGRAM: MatrixDiagram — tco-by-component-by-year — 5×3 grid with rows for each TCO component (build, run, refresh, govern, retire) and columns for years 1, 2, 3; each cell shows a point estimate and a p10–p90 range; row and column totals computed; primitive gives the practitioner a reference format for the three-year roll-up.]

The refresh-cost discipline

Refresh cost is where TCO discipline often fails, because refresh is treated as optional until it is mandatory. Three refresh triggers are non-negotiable for a mature AI programme.

Data-drift refresh occurs when the data distribution the model operates on has shifted enough that the model’s performance is materially affected. Monitoring (Article 25) drives the trigger; refresh cost is the financial response. A feature whose production data shifts materially every six months will trigger refresh at that cadence; the business case must budget for it.

Model-provider refresh occurs when the underlying model provider releases a materially better model and the feature must migrate or be left on a deprecated capability. For generative features using managed APIs, model-provider refresh cycles have been running at 9–18 months in 2023–2025. The business case must budget either for the migration cost or for the capability gap of remaining on the deprecated model.

Regulatory-driven refresh occurs when compliance obligations change — the EU AI Act transitional provisions, for example, or an organisation’s SOC 2 audit scope expansion. Regulatory refresh is unpredictable in timing but predictable in existence; a budget reserve of 5–10% of annual run-cost for regulatory refresh is a defensible practice.

Worked public-sector example — UK HMRC

The UK HMRC’s publicly documented AI initiatives provide a real-world TCO case. The UK National Audit Office’s 2024 reporting on HMRC’s AI and digital transformation noted cost overruns in AI-enabled programmes, with specific attention to the gap between original budget projections and realised run-cost.⁵ The NAO’s analysis, which is consistent with the McKinsey run-cost-exceeds-build-cost pattern, illustrated the public-sector version of the TCO discipline failure: business cases written without full TCO decomposition produce run-rates the NAO later challenges in published reports.

The AITE-VDT teaching point is that public-sector TCO discipline is not lower than private-sector; it is, if anything, higher, because the transparency obligation makes under-estimates visible in ways private-sector enterprises can sometimes avoid. Value leads in public sector should adopt TCO rigour at or above the private-sector standard.

The TCO’s relationship to unit economics

TCO is an absolute-number projection; unit economics (Article 9) is the per-transaction decomposition the TCO depends on. The two are not substitutes. The CFO wants to know both the absolute TCO (to plan the budget) and the unit economics (to understand the scaling behaviour). A well-structured financial summary includes both and reconciles them: the projected TCO for year two divided by the projected transaction volume should equal the projected unit economics for year two. A reconciliation that does not tie is evidence of a bug in one of the two projections.

Summary

AI TCO has five components — build, run, refresh, govern, retire. Run cost now dominates for generative features and typically exceeds cumulative build cost within 12–24 months. Hidden components (data labelling, human-in-the-loop labour, compliance cost) are systematically under-estimated in first-pass business cases. The three-year TCO is projected with uncertainty bands informed by historical variance, industry benchmarks, and scenario analysis. Refresh cost is non-negotiable for mature programmes and is triggered by data drift, model-provider evolution, and regulatory change. Article 9 develops the unit-economics decomposition that the TCO depends on and reconciles with.

Cross-references to the COMPEL Core Stream:

EATP-Level-2/M2.5-Art13-Agentic-AI-Cost-Modeling-Token-Economics-Compute-Budgets-and-ROI.md — core cost modelling article TCO decomposition feeds
EATP-Level-2/M2.5-Art04-Business-Value-and-ROI-Quantification.md — ROI quantification the TCO’s benefit-cost arithmetic supports
EATE-Level-3/M3.5-Art15-Strategic-Value-Realization-Risk-Adjusted-Value-Frameworks.md — risk-adjusted frameworks that host TCO decomposition at governance-professional depth

Q-RUBRIC self-score: 90/100

FinOps Foundation, “FinOps for AI Overview” (2024), https://www.finops.org/wg/finops-for-ai/ (accessed 2026-04-19). ↩ ↩²
McKinsey & Company, “The state of AI in early 2024: Gen AI adoption spikes and starts to generate value” (May 30, 2024), https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai (accessed 2026-04-19). ↩
European Commission, “Proposal for a Regulation laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union legislative acts — Impact Assessment” (April 2021), SWD(2021) 84 final, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52021SC0084 (accessed 2026-04-19). ↩
Stanford Institute for Human-Centered Artificial Intelligence, The AI Index Report 2024 and 2025 editions, https://aiindex.stanford.edu/report/ (accessed 2026-04-19). ↩
UK National Audit Office, “Use of artificial intelligence in government” (March 15, 2024), HC 612, https://www.nao.org.uk/reports/use-of-artificial-intelligence-in-government/ (accessed 2026-04-19). ↩