Compute Budgets and Token-Aware Governance

FlowRidge

Compute budgets protect value three ways. They prevent silent cost sprawl that erodes rNPV. They trigger structural conversations about optimization when budgets are under pressure. They make cost a visible, governed dimension rather than a downstream consequence of unrelated engineering decisions. Programs without compute budgets typically discover cost overruns at quarterly reviews, months after correction would have been cheap.

This article teaches budget-setting mechanics at feature and portfolio level, the four enforcement modes, the integration of budgets with the broader governance process, and the avoidance of two common anti-patterns.

Setting feature-level budgets

A feature-level budget has five components.

Baseline expected cost. Derived from the business case (Article 6) and the rNPV model (Article 7). The baseline reflects the originally-projected cost trajectory at expected usage.

Upside envelope. The budget headroom above baseline for plausible growth. Typically 25–50% above baseline in early-stage features; tighter in stable production features.

Alert threshold. The run-rate level that fires an early notification. Often 80% of budget with monthly look-forward; tighter thresholds for high-stakes or revenue-critical features.

Throttle threshold. The run-rate level that engages automatic throttling (see enforcement modes below). Often 95% of budget.

Reject threshold. The run-rate level above which new requests are rejected entirely. Used sparingly, typically only for cost-sensitive experimental features or demo tenants.

Budget review cadence is monthly for most features and weekly for features in active rollout. The feature lead owns the budget; the FinOps lead (Article 27) consolidates reviews across features; the AI program office consolidates to portfolio-level reporting.

Setting portfolio-level budgets

Portfolio budgets aggregate feature budgets and add cross-feature governance. Three additional components.

Portfolio ceiling. The total AI compute spend authorized for the period. Cannot be exceeded without executive sign-off.

Reserve. 5–10% of the portfolio ceiling held back for features that need to increase mid-period without triggering re-planning. Reserve usage is logged and reported.

Strategic allocation. A share of the portfolio reserved for new features not yet instantiated. Prevents the “all budget consumed by existing features” problem that starves strategic initiatives.

Portfolio budgets are typically set quarterly with monthly reviews. They are an input to the portfolio scorecard (Article 30) and the board-grade reporting (Article 35).

The four enforcement modes

Mode 1 — Alert (soft enforcement)

At alert threshold, a notification fires to the feature lead and the FinOps team. No change in feature behaviour; the notification is an invitation to review. Alert mode is the default for all features and often the only mode used for research or internal-productivity features where throttling would harm more than it helps.

Mode 2 — Throttle (graduated enforcement)

At throttle threshold, the system begins to slow request rates or route excess traffic to cheaper models. A feature running a GPT-4-class model might automatically route overflow to a smaller model; a feature running many retrieval hops might drop to a single-hop retrieval above threshold.

Throttle mode preserves functionality while limiting cost. Users may experience degraded response quality or higher latency. Throttle is the right default for production customer-facing features where hard rejection would harm customer experience but uncapped cost would harm the business.

Mode 3 — Reject (hard enforcement)

At reject threshold, new requests are refused with a documented error. Reject mode is used for cost-sensitive experimental features, demo tenants, and any feature whose over-budget operation would produce unrecoverable cost exposure.

Reject mode is the strongest enforcement and the most operationally brittle. Users encounter hard failures; customer-support tickets spike; the feature team spends cycles on escalations rather than on improvement. Reject is appropriate when the alternative is worse; it is not the default.

Mode 4 — Review (governance enforcement)

At the review threshold (typically a “soft ceiling” above the throttle threshold), an automatic ticket is created for the governance board or AI committee. The feature continues to operate under throttle; the review triggers a structured conversation about whether the cost trajectory is defensible, whether optimization opportunities exist, or whether the feature’s design should change.

Review mode is the mode the design document should reach for first. It preserves user experience, triggers structural conversation rather than automatic penalty, and produces a decision log that supports subsequent audit.

Integration with governance

A budget that fires a throttle silently is an operational control; a budget that triggers a governance conversation is a governed control. The difference matters at audit time and when the program scales.

Four integration practices.

Practice 1 — Budget changes require stage-gate review

Raising a feature’s budget mid-period is a change management event. It requires documented justification, sign-off from the feature lead and the FinOps lead, and entry in the decision log. Budget increases are not routine; they are specific decisions that the organization takes deliberately.

Practice 2 — Overrun triggers root-cause analysis

When a feature exceeds its budget — even with throttle engaged — a post-hoc root-cause analysis examines why. Three causes are most common: optimistic baseline (the business case under-estimated cost), unexpected adoption (usage grew faster than planned), or prompt inefficiency (feature architecture has not adopted Phase 2 FinOps practices). The analysis is a learning event, not a blame event.

Practice 3 — Budget performance enters the VRR

Section 4 of the VRR (Article 16) includes the feature’s budget performance: actual vs. budget, trajectory, current enforcement mode, escalations. Budget transparency prevents the “features delivering value but burning cost” narrative from staying hidden until CFO discovery.

Practice 4 — Budgets align with ISO 42001 operational controls

ISO 42001 Clause 8.1 requires organizations to plan, implement, and control operations needed to meet AI management system requirements. Compute budgets are a direct implementation of this clause for the cost dimension. Organizations pursuing ISO 42001 certification should cite compute budgets explicitly in the operational-control documentation.

Two anti-patterns

Anti-pattern 1 — Silent shutdown

A feature’s cost exceeds budget; the system silently rejects all requests; users encounter failures without explanation; the feature team receives no notification; operations discovers the outage hours later through user complaints. Silent shutdown is the worst failure mode because it combines user harm, operational damage, and governance absence.

The fix is explicit: every automatic enforcement action must generate a human-visible event (alert, escalation, or review trigger). No budget action happens in the dark.

Anti-pattern 2 — Budget inflation by normalization

A feature repeatedly exceeds budget; each time, the budget is raised to match actual spend; the ceiling drifts upward without governance review. Budget inflation by normalization defeats the purpose of budgets — they become descriptive rather than prescriptive.

The fix is to separate planned budget increases (which go through the stage-gate review) from post-hoc budget adjustments (which should be rare and explicitly documented). A feature whose budget has been raised three times in a year without stage-gate review is a feature whose cost discipline has failed.

Cross-reference to Core Stream

EATP-Level-2/M2.5-Art13-Agentic-AI-Cost-Modeling-Token-Economics-Compute-Budgets-and-ROI.md — practitioner compute-budget treatment.
EATF-Level-1/M1.2-Art07-Stage-Gate-Decision-Framework.md — stage-gate framework where budget reviews live.

Self-check

A feature’s budget alert fires at 85% of monthly ceiling with ten days left in the month. Which enforcement actions are appropriate, and in what order?
A cost-sensitive demo tenant exceeds budget; user experience is already acceptable at throttled quality. Which mode is appropriate, and why?
A feature’s budget has been raised three times this year, each time after actual cost exceeded ceiling. Which anti-pattern is this, and what is the remedy?
An ISO 42001 auditor asks how operational control is implemented for AI cost. Where in the documentation is the compute-budget framework cited?