Attribution Modeling for AI Outcomes

FlowRidge

Attribution is the place where honest AI value reporting most often fails. Each team in an organization has incentive to claim credit for shared outcomes. The AI program claims the sale, the marketing channel claims the sale, the customer-success team claims the retention, the AI-retention model also claims the retention. Summing the claims across teams produces an aggregate credit that exceeds 100% — the well-known multi-touch overcounting problem. Attribution modeling is the discipline that prevents this.

This article teaches five attribution approaches, the business questions each answers, and the practice of picking the right model for a given AI value question without inviting the inflated-credit pattern.

Five attribution approaches

1. First-touch attribution

First-touch assigns 100% of credit to the first touchpoint in a sequence. For an AI feature operating early in a customer journey — a recommender that surfaces a product, a search algorithm that retrieves a listing — first-touch makes the AI feature look disproportionately important. It is the attribution model the AI team will most enjoy reporting, which should trigger immediate skepticism.

First-touch is appropriate when the business question is genuinely about introduction: “which channel or feature is most effective at bringing new customers into the journey?” It is inappropriate when the question is about value creation — the introduction is necessary but rarely sufficient.

2. Last-touch attribution

Last-touch assigns 100% of credit to the last touchpoint. For an AI feature operating late in a journey — a checkout-page recommender, a closing suggestion, a next-step prompt — last-touch makes that feature look disproportionately important. Like first-touch, it is easy to compute and often politically convenient; like first-touch, it systematically distorts the value estimate.

Last-touch is appropriate when the question is about conversion closure: “which feature is most effective at converting an already-engaged prospect?” It is inappropriate when the closure is trivial relative to the earlier engagement work.

3. Linear attribution

Linear attribution divides credit equally across all touchpoints. For a journey with ten touchpoints, each gets 10%. Linear is attractive because it does not over-reward any single touchpoint; its weakness is that it treats touchpoints with large value contributions identically to touchpoints with small ones. An ad impression is weighted the same as a GenAI consultation; a tangential recommendation is weighted the same as a closing discount.

Linear is appropriate when touchpoints are genuinely comparable in influence and when the business has chosen fairness-of-attribution over accuracy-of-estimate.

4. Time-decay attribution

Time-decay assigns exponentially more credit to touchpoints closer in time to the outcome. The intuition is that a touchpoint two hours before a purchase had more influence than one two weeks before. The half-life parameter (say, 7 days) controls how fast credit decays.

Time-decay is appropriate when the business question is about recency and conversion efficiency. The half-life must be justified — an arbitrary half-life pick invites CFO challenge. The Netflix recommender blog posts describe several internal time-decay parameter choices for retention modeling.¹

5. Data-driven (algorithmic) attribution

Data-driven attribution — Shapley values, Markov chain attribution, causal-inference-based attribution — uses statistical methods to estimate each touchpoint’s marginal contribution. Shapley-value attribution from cooperative game theory has become a standard in marketing analytics and is spreading to AI feature attribution.

Data-driven approaches are more accurate when well-implemented and more interpretively honest than rule-based attribution. They are also more complex, require more data, and can produce counter-intuitive results that leadership will not accept without explanation. The Google Analytics 4 attribution methodology and published academic work on Shapley attribution in digital advertising are reasonable primary references.²

Choosing the attribution model

Three principles govern the choice.

Principle 1 — The business question drives the model

Different business questions require different attribution models. “Which feature most effectively introduces customers?” wants first-touch. “Which feature most effectively converts engaged prospects?” wants last-touch. “What is the total value our AI program creates?” wants data-driven. The analyst picks the model to match the question, not the question to match the model.

Principle 2 — Expose the model in the VRR

The VRR section 4 financial summary must state which attribution model produced each realized-value figure. Different features may use different models legitimately (first-touch for a top-of-funnel recommender, last-touch for a closing assistant), but every figure’s attribution model must be transparent. Opaque attribution invites the aggregation-to-greater-than-100% failure.

Principle 3 — Report attribution-model sensitivity

For any significant realized-value claim, report the claim under at least two attribution models. “Under linear attribution, the feature contributed $4.2M; under Shapley attribution, $2.8M; both within the CFO-agreed materiality threshold.” This sensitivity reporting is the same discipline the VRR uses for counterfactual uncertainty bands and serves the same purpose.

Attribution across AI and human touchpoints

Enterprise AI rarely operates in isolation. A GenAI customer-service copilot contributes to a ticket resolution alongside the human agent who approved the suggestion. A fraud-detection model contributes to a fraud capture alongside the investigator who confirmed it. Attribution between AI and human touchpoints is one of the hardest and most consequential splits.

Three patterns guide the AI-vs-human split.

Pattern A — AI suggestion accepted without modification. Most of the credit is the AI’s; some credit is the human’s for recognizing a good suggestion. A 70/30 AI/human split is commonly defensible.

Pattern B — AI suggestion accepted with modification. Credit splits more evenly because the human’s modification materially shaped the outcome. A 50/50 or 40/60 split is common.

Pattern C — AI suggestion rejected; human decision substituted. Credit is mostly or entirely the human’s; the AI may deserve credit for having prompted consideration of the decision or debit for having offered the wrong answer. A 0/100 to 20/80 AI/human split is common.

The patterns are instrumentable — the interaction data shows whether the suggestion was accepted, modified, or rejected. The attribution is then rules-based rather than opaque.

Avoiding the over-claim cascade

The common failure pattern: each team reports its features’ realized value using whichever attribution model makes their features look best. Summed across teams, the aggregate exceeds the organization’s realized value by 30–50%. CFOs catch the error, trust in the AI value numbers collapses, and subsequent VRRs face elevated skepticism.

The organizational fix is an attribution-governance rule: the central AI program office picks one primary attribution model for the portfolio-level VRR, and individual features may report their preferred model in feature-level narratives with the primary model’s number also reported. Individual-feature preferences are accommodated; aggregate overcounting is prevented.

A second useful rule: any claim above a materiality threshold (e.g., $1M annualized) must report under at least two attribution models with the difference explained. This surfaces attribution ambiguity early rather than letting it accumulate.

Cross-reference to Core Stream

EATP-Level-2/M2.5-Art04-Business-Value-and-ROI-Quantification.md — practitioner ROI quantification where attribution governs the apportionment.
EATP-Level-2/M2.5-Art09-Value-Realization-Reporting-and-Communication.md — VRR communication of attribution.

Self-check

An AI team reports its feature contributed $8M using first-touch attribution. Marketing reports the same outcome contributed $6M using last-touch. What is the problem, and what governance rule applies?
A retention outcome was influenced by a customer-success outreach (week 1), an AI recommendation (week 3), and a price-promotion email (week 5). Under time-decay with 7-day half-life, how is credit divided? Under linear?
An AI suggestion was offered, accepted, and modified by the human agent. Which of the three AI-human split patterns applies?
The CFO asks “is our portfolio realized-value number honest?” What two checks should be in place to answer?