COMPEL Specialization — AITM-AAG: Agentic AI Governance Associate Lab 1 of 2
Lab objective
Apply the Level 0–5 autonomy rubric from Article 3 to five described agents, producing a classification record per agent that includes level, rationale against the four classification criteria, reclassification triggers, and a sanity check against the oversight-mode mix implied by the level.
Prerequisites
- Completion of Articles 1, 2, and 3 of this credential.
- Familiarity with the Anthropic Responsible Scaling Policy and OpenAI Preparedness Framework as referenced public analogues.
Agents to classify
Treat each description as the full known facts. Where additional facts would change the classification, name the dependency.
Agent A — Legal research assistant
A mid-size law firm operates an agent that takes a case description, searches three subscription legal databases, reads relevant judgments, summarises precedent, and drafts a memo. The agent runs for 10–15 minutes per invocation with no human in the loop; a partner reviews the memo before it is sent to the client. The agent has read-only access to the databases and a draft-write capability to the firm’s internal document management system. Persistent memory stores only case identifier and a short result summary for cross-case reference.
Agent B — Customer refund bot
A retailer operates a customer-facing agent that answers questions about orders. For qualifying refund requests under a policy cap (reversible for 30 days, up to €100 per transaction), the agent issues refunds directly via the payment-processor API. A human agent handles anything outside the policy. The bot sends the refund; the customer receives confirmation. No human is in the loop on the refund action itself.
Agent C — Overnight reconciliation agent
A financial-services back-office agent runs from 21:00 to 05:00 daily, reconciling transactions across three internal systems. Over the run it may invoke 2,000–5,000 tool calls. Exceptions are written to a review queue that humans process the next morning. The agent has write access to an internal reconciliation ledger; actions are reversible within the day but not easily thereafter. Persistent memory stores per-account reconciliation state.
Agent D — Coding assistant with test runner
A developer-facing agent accepts a natural-language instruction, proposes code, and — with the developer’s explicit approval per action — executes a test runner and (separately approved) commits code to a branch. Each action is individually approved in an IDE plugin. The agent has no persistent memory across sessions.
Agent E — Multi-agent research workbench
A trading-research team operates a multi-agent workbench. A researcher agent searches sources, an analyst agent synthesises, a critic agent challenges, and a coordinator agent produces a draft note. The workbench runs for 30–45 minutes per note with no step-level human approval; a human reviews the final draft. The critic’s challenges frequently cause the analyst to revise; the coordinator has kill-switch authority over workers. Persistent memory stores per-security research notes for cross-session reference.
Step-by-step method
For each agent:
- Human-in-the-loop cadence. What is the longest stretch of execution without human approval or review? Count in actions, not wall-clock time.
- Reversibility. Are actions read-only, reversible write, or irreversible?
- Tool surface. Narrow (few specific tools), medium (a handful, all constrained), or broad (general-purpose)?
- Consequence severity. What is the worst-case consequence of a single bad action?
- Level assignment. Apply the rubric.
- Oversight-mode sanity check. Does the oversight regime implied by the level match what is stated in the description?
- Reclassification triggers. Which of the seven standing triggers from Article 3 are especially important for this agent?
Worked answers — assess your own work against these
Minor variations in defensible framing are acceptable; major divergences indicate a need to rework against the rubric.
Agent A — Legal research assistant
- Cadence: 10–15 minutes of autonomous execution; final deliverable reviewed.
- Reversibility: read-only external access; write is draft-only (reversible).
- Tool surface: narrow (three databases + internal DMS draft).
- Consequence severity: medium (incorrect research informs advice, but advice is partner-reviewed).
- Level: 3 — Supervised executor. Plan-approval not required per description; this is a borderline with Level 3 being the safer call. Consider promoting to plan-approval (which would reinforce Level 3) if partner workload allows.
- Oversight mode: post-hoc review at memo level. Acceptable for Level 3 given the narrow tool surface and reversible writes.
- Key reclassification triggers: tool addition (new database, especially a write-capable one); memory-scope expansion; partner-review cadence change.
Agent B — Customer refund bot
- Cadence: per-action, but the refund action is not per-action-approved; it is policy-gated.
- Reversibility: financial action; reversible in principle but creates customer-trust impact if wrong.
- Tool surface: narrow (payment-processor API).
- Consequence severity: medium to high (financial action; reputation risk).
- Level: 2 — Bounded executor, with caveats. The per-action human-in-the-loop is technically policy enforcement rather than human approval; the specialist should challenge the engineering team on whether Level 2 is being claimed cheaply. A more conservative classification is Level 3 with post-hoc sampling and a tighter cap.
- Oversight mode: policy gate + post-hoc audit required.
- Key reclassification triggers: policy cap increase; tool surface expansion (e.g., adding store-credit issuance); change in refund-action reversibility.
Agent C — Overnight reconciliation agent
- Cadence: overnight, thousands of actions without human intervention.
- Reversibility: internal-ledger writes, reversible within the day by exception queue processing.
- Tool surface: narrow (three internal systems, constrained APIs).
- Consequence severity: medium (reconciliation errors caught in next-day review, but at scale).
- Level: 4 — Autonomous executor. Long unattended runs with many actions define the level. Exception queue + next-day review is an acceptable Level 4 oversight regime.
- Oversight mode: guardrails (step cap, exception routing) + post-hoc review + stop-go (SRE can halt the run).
- Key reclassification triggers: expansion of write scope; addition of irreversible actions; increase in step count beyond current cap.
Agent D — Coding assistant with test runner
- Cadence: per-action approval.
- Reversibility: code writes reversible (branch commits; can be reverted).
- Tool surface: narrow (editor, test runner, VCS).
- Consequence severity: low to medium (faulty code caught in tests or code review).
- Level: 2 — Bounded executor. Per-action approval defines the level.
- Oversight mode: pre-authorisation per action. Consistent with Level 2.
- Key reclassification triggers: reduction in per-action approval (e.g., batch approvals); addition of production-write tools (e.g., deploy).
Agent E — Multi-agent research workbench
- Cadence: 30–45 minutes of autonomous multi-agent execution; final deliverable reviewed.
- Reversibility: read-only + memory writes (persistent); no production action.
- Tool surface: medium (sources + internal memory).
- Consequence severity: low to medium (faulty research informs trading decisions; human reviews draft).
- Level: 4 — Autonomous executor. Multi-agent coordination with no step-level human makes this Level 4 despite the read-only tool surface. A Level 3 claim is defensible if the coordinator’s kill-switch and the final human review are rigorous and measured; the specialist should ask for the kill-switch rehearsal evidence before agreeing to Level 3.
- Oversight mode: runtime intervention (coordinator kill-switch) + post-hoc review (human final read) + stop-go (operator can halt the workbench).
- Key reclassification triggers: addition of action tools; reduction in final-review thoroughness; addition of new agents to the workbench.
Deliverable — classification register extract
Produce a five-row register:
| Agent ID | Level | HITL cadence | Reversibility | Tool surface | Consequence | Primary oversight mode | Reclassification triggers |
|---|---|---|---|---|---|---|---|
| Agent A | 3 | Final deliverable | Reversible | Narrow | Medium | Post-hoc review | Tool addition; memory expansion; review cadence |
| Agent B | 2 or 3 | Policy gate per action | Financial — reversible but reputation | Narrow | Medium-high | Policy + post-hoc audit | Cap increase; tool expansion |
| Agent C | 4 | Overnight, exception-routed | Reversible within day | Narrow | Medium at scale | Guardrail + stop-go | Write-scope expansion; irreversibility |
| Agent D | 2 | Per action | Reversible | Narrow | Low-medium | Pre-authorisation per action | Approval-batching; deploy-tool addition |
| Agent E | 4 (or 3) | Per session | Read-only + memory | Medium | Low-medium | Runtime intervention + post-hoc | Action tool addition; new agents |
Lab sign-off
Your classification is defensible if, for each agent, you can answer a Methodology Lead’s three follow-up questions:
- Under what conditions would you move this agent up one level?
- Under what conditions would you move it down?
- What evidence, if collected, would change your confidence?
The lab’s pedagogic point is that classification is a governance decision, not a model-capability claim, and reasonable specialists may disagree on borderline cases. The disagreement is the decision; surfacing it produces better governance than suppressing it.