Skip to main content
AITE M1.2-Art73 v1.0 Reviewed 2026-04-06 Open Access
M1.2 The COMPEL Six-Stage Lifecycle
AITF · Foundations

Template — Agent SLO / SLI Sheet

Template — Agent SLO / SLI Sheet — Transformation Design & Program Architecture — Advanced depth — COMPEL Body of Knowledge.

6 min read Article 73 of 53

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Artifact Template 3 of 5


How to use this template

Populate one sheet per agent at the end of Organize. The sheet records the operational promises the agent makes to its users and to the operators, the measurements that confirm or dispute those promises, and the actions taken when the promises are at risk.

Review monthly. Update when the agent’s tool surface, autonomy level, user population, or model mix changes materially. Publish the sheet to the same location as the dashboard it describes; the sheet and the dashboard must tell the same story.


Agent SLO / SLI Sheet

Identity

FieldValue
Agent identifierstable-agent-id
Charter version1.0
SLO sheet version1.0
Last updatedYYYY-MM-DD
Agent owner (role)role
Architect of record (role)role
Observability owner (role)role
Error-budget policy owner (role)role

1. User-facing SLOs

The commitments the agent makes to its users. Exactly what the users are promised.

SLO IDStatementTargetWindowMeasurement basis
SLO-A1Task-completion rate≥ 90%trailing 28 daysruns classified as completed / total runs
SLO-A2Acknowledgement latency (request to first response)≤ 5 s p95trailing 7 daystimestamp of first response token minus request receipt
SLO-A3Task-turnaround (simple task class)≤ 60 s p95trailing 7 daystask complete minus task submitted
SLO-A4HITL-gate response time (when gate fires)≤ 4 business hours p95trailing 28 daysoperator decision timestamp minus gate-fire timestamp

Task classes are agent-specific. If the agent handles heterogeneous tasks, SLO-A3 can be decomposed per class.


2. Operational SLIs

The indicators the operators rely on to keep the agent healthy. SLIs feed into SLOs; some SLIs do not have a direct SLO but trigger alerts when they deviate.

SLI IDIndicatorHow measuredAlert thresholdLinked SLO
SLI-B1Tool-error rate, by toolerrored tool-calls / total tool-calls, per tool, 1-hour window> 1% for 3 consecutive hoursSLO-A1
SLI-B2Loop length (steps per run)steps, per runp95 > 1.5× trailing-28-day baselineSLO-A3
SLI-B3HITL-fire rateruns with ≥1 gate fire / total runs> 1.25× trailing-28-day baselineSLO-A4
SLI-B4Model-call cost per runsum cost USD / runs, 1-day windowp95 > budgetno direct SLO; feeds capacity plan
SLI-B5Memory-write schema-violation countviolations, 1-day window> 0no direct SLO; critical signal
SLI-B6Indirect-injection-detector positive ratepositives / retrieval events, 1-day window> 0.1% or > 1.5× baselineno direct SLO; security signal
SLI-B7Kill-switch fire countcount, 1-day window> 0 synchronous (information); > 0 asynchronous (alert)no direct SLO; incident signal

3. Error-budget policy

The error budget translates SLO targets into an operational rule set. When the budget is at risk, development pauses and reliability work takes precedence.

FieldValue
Primary SLO for budgete.g., SLO-A1 task-completion
Error budget (per window)(1 − target) × total events, per window
Burn-rate alert thresholds1-hour burn > 14.4× sustained; 6-hour burn > 6× sustained
Budget-at-risk response(a) halt non-essential releases; (b) prioritise reliability work; (c) convene incident-review meeting
Budget-exhausted response(a) halt all releases; (b) consider temporary autonomy-level downgrade; (c) open ticket to executive sponsor

4. Task classes and differentiated SLOs

If the agent serves multiple task classes with materially different expectations, decompose.

Task classDescriptionRepresentative SLO
simplee.g., single-tool read with summarisationtask-turnaround ≤ 60 s p95
mediume.g., multi-tool with bounded looptask-turnaround ≤ 5 min p95
complex / long-horizone.g., multi-step planning with HITL gatestask-turnaround ≤ 1 business day p95; acknowledgement ≤ 5 s p95

Task-class assignment must be deterministic (e.g., from a class-router) so that the measured denominator is not dependent on the outcome.


5. User populations and differentiated targets

If different user populations receive different SLOs (free vs. paid; internal vs. customer-facing), enumerate.

PopulationSLO differencesContractual basis
internalsame targets; relaxed alert thresholdsinternal operating agreement
enterprise customerstighter p95 targets; named support pathcontract reference

6. Measurement plumbing

Where the numbers come from. The dashboard consumers should be able to audit the path from event to metric.

MetricSourceAggregationRetention
Task-completion rateagent.run spans in observability sinkcount by outcome, windoweddays
Tool-error rateagent.tool_call spanserror_count / total_countdays
Model-call cost per runagent.model_call cost attribute summed to runsum, grouped by run_iddays
Gate-fire countagent.gate_eval audit eventscountregulatory horizon

7. Report and review cadence

ReportCadenceAudienceOwner
SLO compliance snapshotweeklyagent owner; observability ownerobservability owner
Error-budget burn reportmonthlyagent owner; executive sponsor; architectagent owner
SLO sheet review + updatequarterly or on change eventsigning roles belowarchitect of record

8. Change log

DateVersionChangeTriggerAuthor (role)
YYYY-MM-DD1.0initial sheetonboardingarchitect

9. Sign-off

RoleSign-off date
Agent owner
Architect of record
Observability owner
Error-budget policy owner

End of Agent SLO / SLI Sheet.