Unit Economics of an AI Feature

FlowRidge

COMPEL Specialization — AITE-VDT: AI Value & Analytics Expert Article 9 of 35

A CFO asks a simple question: how much does each AI-generated customer response cost us. The value lead opens three dashboards and discovers that the question has no single answer. One dashboard reports cost per API call. Another reports cost per conversation. A third reports cost per resolved ticket. Each is a legitimate unit, and each produces a different number — the cost per API call is two cents, the cost per conversation is eleven cents, and the cost per resolved ticket is forty-three cents. Without the distinction clearly stated, the CFO cannot compare the feature to its alternative (a human agent at a known cost per resolved ticket), cannot decide whether to scale the feature, and cannot detect cost-to-outcome inversion. The unit economics of an AI feature are the atomic building blocks of the feature’s P&L. This article teaches the practitioner to compute them across three canonical bases — per transaction, per successful decision, per hour saved — to compare unit economics across multi-vendor deployments without endorsing any, and to design a unit-economics dashboard that drives operational decisions rather than merely reports them.

The three canonical unit-economics bases

Three denominators dominate AI unit-economics analysis, and each answers a different business question. A feature’s P&L should report all three when the data supports it.

Cost per transaction. The denominator is the count of distinct requests the feature handled, regardless of outcome. A customer-service copilot handling 500,000 conversations in a month at a total operational cost of US$110,000 has a cost per transaction of US$0.22. This unit is the easiest to compute and the most misleading in isolation; it does not distinguish successful from unsuccessful transactions and can make a feature look cheap while it is shipping failure at scale.

Cost per successful decision. The denominator is the count of transactions that produced the intended outcome — the ticket that resolved, the recommendation that was acted on, the draft that was accepted. For the same 500,000 conversations, if 340,000 resolved the customer query, the cost per successful decision is US$0.32. This unit is the most business-relevant and the hardest to compute, because it requires instrumenting outcome tracking back to the transaction level. It is also the unit that compares directly to the human baseline (the cost per resolved human-agent ticket).

Cost per hour saved. The denominator is the human labour hours the feature displaced, for features that automate or accelerate human work. For a contract-review copilot saving 420 associate hours per month at a total operational cost of US$28,000, the cost per hour saved is US$67. This unit is the one most common in consulting business cases and the one most sensitive to the “human-hour valuation” assumption the AITE-VDT glossary flags as a critical sensitivity. An associate hour is worth the associate’s fully-loaded cost — typically 2–3x the salary hourly rate — not the salary rate alone, and different assumptions produce wildly different ROIs.

The cost numerator — decomposition at transaction grain

Unit economics are only as accurate as the numerator’s decomposition. Five cost elements make up the numerator for a typical AI transaction, and each has a characteristic variable behaviour.

The inference cost is the direct model-invocation cost — tokens, API fees, compute. For managed APIs this is a straightforward per-token price times token count. For self-hosted models it is the per-GPU-hour or per-CPU-hour amortised across the transactions handled.

The retrieval cost is the cost of fetching grounding context for the model — vector-store queries, metadata lookups, search-index calls. For retrieval-augmented generation (RAG) features, retrieval cost can rival inference cost if not actively managed.

The orchestration cost is the cost of the framework coordinating the feature — LangChain/LlamaIndex runtime, tool-calling overhead, agent memory management. For agentic features with multi-step reasoning, orchestration cost can exceed inference cost for complex transactions.

The infrastructure cost is the cost of the feature’s hosting, networking, and storage — compute for the application layer, storage for logs and caches, egress fees. This is usually small per-transaction but non-zero.

The observability and governance cost is the cost of the evaluation, monitoring, and control infrastructure amortised per transaction. Often forgotten in unit economics, this component maps to the governance TCO category of Article 8.

[DIAGRAM: HubSpokeDiagram — unit-cost-at-hub-inputs-as-spokes — central hub “Cost per successful decision” with five spokes labelled inference, retrieval, orchestration, infrastructure, observability and governance; each spoke annotated with its typical percentage of total (e.g., inference 55%, retrieval 20%, orchestration 10%, infrastructure 8%, observability and governance 7%); primitive gives the practitioner a decomposition reference.]

Cross-stack comparison — the neutrality discipline

A unit-economics comparison across vendor stacks is only credible if the same work is costed on each stack. The AITE-VDT standard is to benchmark the same transaction on at least three ecosystems before a procurement or architecture decision. Three canonical comparison shapes work:

Managed API versus managed API. The same task on two or more managed APIs — a GPT-class model versus a Claude-class model versus a Gemini-class model. Token-price differences, context-window efficiency, and model routing produce unit-economic differences that can shift the per-successful-decision cost by 30–50% for the same capability. The FinOps Foundation’s M1.3FinOps for AI paper documents this pattern and recommends multi-provider benchmarking as standard practice.¹ No endorsement is implied; the practice is to let the unit-economic comparison drive the choice.

Managed API versus open-weight self-hosted. The same task on a managed API versus a Llama/Mistral/Qwen/DeepSeek-class open-weight model hosted on owned or rented infrastructure. Self-hosted models typically have higher fixed cost and lower marginal cost; at scale (usually ≥10M tokens per month) self-hosted can beat managed on per-token cost, but at lower volume the amortised fixed cost dominates. The crossover point is knowable and should be computed explicitly rather than assumed.

Commercial stack versus open-source stack. The same end-to-end feature — not just the model — on a commercial stack (managed API + hosted vector store + commercial orchestration) versus an open-source stack (self-hosted model + pgvector or Qdrant + LangGraph or Haystack + OpenCost for observability). The commercial stack typically has lower build cost and higher run cost; the open-source stack typically has higher build cost and lower marginal run cost but requires more in-house capability.

The practitioner discipline is to produce the comparison as a neutral spreadsheet, with all assumptions documented, and to let the business decision follow from the comparison rather than precede it.

Worked real-world example — Duolingo Max

Duolingo’s public disclosures on the Duolingo Max generative-AI feature provide a documented unit-economics case. Duolingo’s 2023 Form 10-K annual report and subsequent investor communications disclosed the GenAI-feature economics in aggregate form, including the margin impact of token costs on subscription-grade features.² The company explicitly discussed the need to balance feature richness against per-user token spend and the pricing implications. The disclosure is an example of a company that treats unit economics as a board-level disclosure item rather than an internal metric.

The AITE-VDT teaching point is that Duolingo’s discipline is replicable. A subscription-grade AI feature has a per-user token spend that must remain a fraction (typically less than 20–30%) of the per-user subscription revenue, or the feature’s contribution margin is negative. The practitioner computes the ratio, tracks it at feature level, and flags deterioration early. Duolingo’s public signal that it manages this ratio actively is both a useful comparator and a useful discipline model.

Worked real-world example — Klarna customer service

Klarna’s investor disclosures in early 2024 reported aggressive automation of customer service with AI, including quantified claims about unit economics and labour displacement. By late 2024, public reporting documented that Klarna had reversed parts of its automation approach and begun rehiring human agents in specific scenarios.³ The Klarna case is instructive precisely because the story moved in both directions: the unit economics looked favourable at the aggressive-automation peak but deteriorated as customer-satisfaction impacts and exception-handling costs became visible.

The AITE-VDT teaching point is that unit economics must be computed not only on the primary outcome (cost per resolved ticket) but on the secondary effects (cost per satisfaction-adjusted resolved ticket, or cost per conversion-preserving resolved ticket). A unit-economics model that optimises on the primary metric alone misses the second-order effects that Klarna’s reversal surfaced. The practitioner includes these secondary adjustments explicitly, even when the data is imperfect, because the reported “two cents per interaction” of the aggressive-automation pitch obscures the actual cost the business experiences.

The caveat matters: Klarna’s specific numbers were disputed in some press reporting, and the AITE-VDT discipline is to present the public disclosure alongside the counter-perspective and to teach the unit-economics lesson separately from the debate about whether Klarna’s specific figures were accurate.

[DIAGRAM: MatrixDiagram — unit-economics-cost-basis-by-success-definition — 3×3 grid with rows for cost basis (per transaction, per successful decision, per hour saved) and columns for success definition (output-level, outcome-level, adjusted-outcome); each cell annotated with its interpretation; primitive teaches the comparison framework across definition choices.]

Designing a unit-economics dashboard

A unit-economics dashboard drives operational decisions. Four design rules make the dashboard operable rather than merely informational.

First, the dashboard always shows at least two unit denominators simultaneously — typically cost per transaction and cost per successful decision — so the viewer can see both the volume-efficiency story and the outcome-efficiency story. A dashboard with only one denominator hides the other story.

Second, the dashboard tracks unit economics against a target band, not a single number. The target band is set by the business case (cost per successful decision should remain within US$0.25–US$0.40), and excursions above the band trigger review. A dashboard that tracks only the actual number without a band invites drift.

Third, the dashboard decomposes the unit-cost numerator into its five components (inference, retrieval, orchestration, infrastructure, observability) so a rising unit cost can be diagnosed. A headline “cost per successful decision rose 30%” is not actionable; a diagnosis “retrieval cost rose 80% due to average hop count increasing from 3 to 5” is.

Fourth, the dashboard includes a counter-metric — typically success rate or customer satisfaction — alongside the unit cost. Unit cost falling while success rate also falls is not an operational win; it is cost-outcome inversion. Pairing the metrics prevents the inversion from hiding.

The unit-economics reconciliation with TCO

The unit-economics numbers and the TCO projection must reconcile. Projected annual transaction volume times projected cost per transaction should equal the projected annual run cost in the TCO. A reconciliation that does not tie is evidence of a bug or an assumption gap. Practitioners who do not perform this reconciliation produce business cases where TCO and unit economics tell different stories to different stakeholders; the CFO asking both will catch the inconsistency and discount the entire case.

Summary

Unit economics is the atomic P&L building block of an AI feature. Three canonical denominators — per transaction, per successful decision, per hour saved — each answer a different business question. The cost numerator decomposes into inference, retrieval, orchestration, infrastructure, and observability components. Cross-stack comparison across managed APIs, open-weight self-hosted, and open-source full-stack deployments produces the neutrality the AITE-VDT discipline requires. The unit-economics dashboard shows at least two denominators, tracks against target bands, decomposes the numerator for diagnosis, and pairs unit cost with a counter-metric. Article 10 extends the unit-economics discipline into the specialised territory of generative-system token economics.

Cross-references to the COMPEL Core Stream:

EATP-Level-2/M2.5-Art13-Agentic-AI-Cost-Modeling-Token-Economics-Compute-Budgets-and-ROI.md — core cost-modelling article that unit economics feeds at transaction grain
EATP-Level-2/M2.5-Art04-Business-Value-and-ROI-Quantification.md — ROI quantification methodology using unit economics as its atomic input
EATP-Level-2/M2.5-Art10-From-Measurement-to-Decision.md — practitioner discipline of translating unit-economic measurement into operational decisions

Q-RUBRIC self-score: 90/100

FinOps Foundation, “FinOps for AI Overview” (2024), https://www.finops.org/wg/finops-for-ai/ (accessed 2026-04-19). ↩
Duolingo Inc., Form 10-K Annual Report for fiscal year 2023 (filed February 28, 2024), US Securities and Exchange Commission, https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001562088&type=10-K (accessed 2026-04-19). ↩
Amelia Keller and Jonathan Browning, “Klarna rehires human staff after axing customer service agents for AI”, Bloomberg (November 26, 2024), https://www.bloomberg.com/news/articles/2024-11-26/klarna-rehires-human-staff-after-axing-cx-agents-for-ai (accessed 2026-04-19). ↩