Skip to main content
AITE M1.1-Art75 v1.0 Reviewed 2026-04-06 Open Access
M1.1 Foundations of AI Transformation
AITF · Foundations

Artifact Template: Agentic Runtime SLO and SLI Sheet

Artifact Template: Agentic Runtime SLO and SLI Sheet — AI Strategy & Vision — Advanced depth — COMPEL Body of Knowledge.

8 min read Article 75 of 48

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Artifact Template


How to use this template

This template is the service-level-objective (SLO) sheet for an agentic AI feature — a feature whose runtime orchestrates multi-step model reasoning with tool calls, as described in Lab 3. The sheet is authored by the runtime owner, reviewed by the site-reliability and governance owners, and carried as the living record that on-call engineers, release engineers, and the architecture review board consult.

Agentic features have distinctive reliability dynamics — per-turn latency variance, per-run cost variance, tool-call failure cascades, loop-length blow-ups, prompt-injection-driven behaviour changes — that generic web-service SLO frameworks miss. The template captures those dynamics explicitly.

Every section is required. Sections not applicable (for example, if the feature has no write-capable tools, the write-tool section is completed with “not applicable — this feature has only read-only and draft-only tools” and a one-sentence rationale).


Agentic Runtime SLO and SLI Sheet — [Feature Name]

1. Identification and ownership

FieldValue
Feature name[link to architecture design document]
Runtime platform[LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex Agents, hand-rolled, or other]
Runtime version[pinned]
Runtime owner[single accountable individual, team]
Site-reliability reviewer[name, role]
Governance reviewer[name, role]
On-call rotation[team name, rotation schedule link]
Effective dateYYYY-MM-DD

2. System invariant

The single machine-checkable invariant that must hold at every step of every agent run. Written as one sentence in the Lab 3 style. For a read-and-draft-only feature, the invariant limits write-capable tool reach. For a tool-restricted feature, the invariant names the restrictions. For a human-in-the-loop feature, the invariant names the human step that must precede any action.

[Example: “The Feature can read market state, portfolio state, research state, and compliance state, and can draft messages. It cannot send, submit, modify, or cancel an order, a position, or a communication to an external venue or counterparty. No agent plan, tool call, prompt injection, or operator instruction can place the Feature outside this envelope.”]

FieldValue
Invariant test[property-based test, policy simulation, fuzzing harness]
Test location[CI path, cadence]
Failure action[block release, page runtime owner, other]

3. Service-level indicators (SLIs)

3.1 Latency SLIs

IndicatorDefinitionMeasurement source
Per-turn latency[Wall-clock time from user-turn input to assistant-turn output][trace span]
Per-tool-call latency[Wall-clock time from tool-call dispatch to tool-call result][trace span]
Per-run duration[Total duration of a multi-turn run from session open to session end][trace span]
Time-to-first-token[For streamed responses][trace span]

3.2 Cost SLIs

IndicatorDefinitionMeasurement source
Per-turn cost[Aggregated LLM and tool costs for a single turn][cost attribution record]
Per-run cost[Aggregated cost for an entire run][cost attribution record]
Input-token per turn[Tokens sent to generator, aggregated across all tool-interleaved calls][trace span]
Output-token per turn[Tokens generated][trace span]

3.3 Behavioural SLIs

IndicatorDefinitionMeasurement source
Loop-length[Number of agent steps in a run, including reasoning steps and tool calls][trace]
Tool-call success rate[Fraction of tool calls returning a success result, per tool][trace, per-tool]
Validator-failure rate[Fraction of tool calls intercepted by a pre- or post-execution validator, per validator][trace]
Refusal rate[Fraction of runs in which the generator refuses to respond][trace]
Unsafe-content rate[Fraction of responses flagged by the safety classifier][safety classifier output]
Human-review rating mean[If the feature is sampled for human review][review console]

3.4 Invariant SLIs

IndicatorDefinitionMeasurement source
Invariant-violation rate[Detected violations of the system invariant per run; must be 0][runtime enforcement log]
Write-capable-tool-reach attempts[Attempts to invoke a tool outside the feature’s envelope, per run][runtime enforcement log]

4. Service-level objectives (SLOs)

SLOTargetMeasurement windowError budgetAction on breach
Per-turn latency p50[e.g., ≤ 2.0 s][28-day rolling][…][…]
Per-turn latency p99[e.g., ≤ 8.0 s][28-day rolling][…][…]
Per-run cost p95[e.g., ≤ $0.40][28-day rolling][…][…]
Loop-length p99[e.g., ≤ 14 steps][28-day rolling][…][…]
Tool-call success rate[e.g., ≥ 99.0%, per tool][28-day rolling][…][…]
Refusal rate[e.g., 0.5% to 5.0% band][7-day rolling][…][alert if out of band, in either direction]
Unsafe-content rate[e.g., ≤ 0.02%][24-hour rolling][…][page on call]
Invariant-violation rate[0 — no budget][real-time][none — zero-tolerance][immediate incident declaration]

5. Error budget policy

FieldValue
Monthly error budget per SLO[the time the service may be out of SLO before release cadence slows]
Budget-burn alert thresholds[e.g., 2× and 10× burn-rate alerts]
Release-cadence impact[when budget is exhausted, release cadence changes to what]
Budget-exhausted recovery[the process to re-earn budget — improvement work, post-incident remediation, stricter canary]
Invariant-violation budget[zero; not subject to the general error-budget process]

6. Incident response

6.1 Severity classification

SeverityTriggerResponse timeCommunication
SEV-1[invariant violation, global outage, data breach][minutes][exec paging, customer communication]
SEV-2[SLO breach with active user impact, partial outage][tens of minutes][team paging, status page update]
SEV-3[SLO budget-burn rate alert, degraded behaviour][business hours][team notification]
SEV-4[minor anomaly, non-urgent drift][next business day][tracked issue]

6.2 Kill-switch topology

ModeWho can invokeAuthenticationPropagationIn-flight behaviourSmoke-test cadence
Tool-level freeze[role(s)][auth step][target seconds][…][…]
Generator-level freeze[role(s)][…][…][…][…]
Agent-level freeze[role(s)][…][…][…][…]
Global freeze[role(s); two-person rule?][…][target ≤ 60s][…][…]

6.3 Runbooks

For at least three named failure scenarios, the runbook (or a link to the runbook) with decision tree, escalation paths, and rollback commands.

  • [Prompt-injection incident runbook]
  • [Generator-outage failover runbook]
  • [Kill-switch invocation runbook]
  • [Tool-authorization breach runbook]
  • [Cost-anomaly incident runbook]

7. Observability contract

FieldValue
Trace backend[Arize, Langfuse, Datadog, OpenTelemetry stack, or other]
Metric backend[Prometheus, CloudWatch, Datadog, or other]
Log backend[with retention class per stream]
End-to-end trace ID propagation[from user-turn through runtime through tool calls through LLM calls]
Log hygiene[redaction rules for free-text inputs, retention class per stream]
Dashboard location[URL]
Alert channels[paging destinations, severity routing]

8. Review and amendments

RoleNameDecisionDate
Runtime owner[…]AuthoredYYYY-MM-DD
Site-reliability reviewer[…]ApprovedYYYY-MM-DD
Governance reviewer[…]ApprovedYYYY-MM-DD
Architecture reviewer[…]ApprovedYYYY-MM-DD

Amendment log. Material amendments (change of invariant, change of SLO target, change of kill-switch topology) require re-review by the full panel; non-material amendments (new SLI added for observation, alerting-threshold tuning) may be self-approved by the runtime owner and site-reliability reviewer.


Notes on use

When to use this template. Every agentic feature — any feature that orchestrates multi-step model reasoning with tool calls. Single-turn features use a simpler SLO sheet.

Common errors in first-time use. Missing system invariant; SLOs that are not measurable from the trace data; kill-switch topology without propagation SLO; zero-tolerance SLOs expressed with error budgets; runbooks reduced to links that point to empty wiki pages. Reviewers treat these as blocking.

What follows. The SLO sheet is cited from Template 1 §9 (operational architecture) and feeds the feature’s release-gate readiness review. It is re-reviewed on every material runtime change, every new tool added, and at least quarterly.


© FlowRidge.io — COMPEL AI Transformation Methodology. All rights reserved.