Skip to main content
AITM M1.6-Art52 v1.0 Reviewed 2026-04-06 Open Access
M1.6 People, Change, and Organizational Readiness
AITF · Foundations

Lab — Human Oversight Regime Design for a Finance Agent

Lab — Human Oversight Regime Design for a Finance Agent — Organizational Change & Culture — Applied depth — COMPEL Body of Knowledge.

7 min read Article 52 of 18

COMPEL Specialization — AITM-AAG: Agentic AI Governance Associate Lab 2 of 2


Lab objective

For the described agent below, produce an end-to-end oversight design that includes (a) action-class-to-oversight-mode mapping, (b) operator roles with competency requirements, (c) signal specifications per mode, (d) a human-in-the-loop escalation matrix, and (e) a kill-switch rehearsal schedule. The design must satisfy EU AI Act Article 14(4) requirements (a)–(e) and avoid the oversight-theatre patterns of Article 5.

Prerequisites

  • Completion of Articles 3, 5, 6, 10, and 11 of this credential.
  • Access to EU AI Act Regulation (EU) 2024/1689 text.

Agent description — the subject of the lab

A pan-European asset-management firm has deployed a portfolio-research agent that supports portfolio managers in small-cap European equities. The agent’s capabilities:

  • Search. Retrieves filings from regulatory databases (e.g., national company registries, EDGAR-equivalent sources), company websites, and a subscription news feed.
  • Synthesis. Produces research summaries up to 2,000 words.
  • Draft write. Drafts research notes into a team workspace where portfolio managers review before publication.
  • Alerting. Generates email alerts to portfolio managers when specified signals (earnings surprises, material disclosures) are detected.
  • Comparable selection. Suggests, but does not execute, comparable companies for a target’s valuation exercise.

Operation profile:

  • Runs asynchronously; triggered by corporate-event feeds or by user request.
  • 20–60 concurrent sessions during European market hours.
  • Per-session execution time: 5–20 minutes.
  • Persistent memory: per-security research notes, last 24 months.
  • Shared memory: team-wide curated facts layer, writable by designated editor agents and by managers manually.

Classification context:

  • Autonomy Level 3 (supervised executor): final deliverable reviewed; session is autonomous.
  • Not classified as EU AI Act Annex III high-risk by current reading. Articles 50 and 52 apply because alerts and notes are AI-generated content shown to users.

Design deliverables

The lab produces a written oversight specification of approximately 1,500–2,000 words, plus the five deliverables below. The expected shape of each deliverable is described; the specifics are your design.

Deliverable 1 — Action-class-to-oversight-mode mapping

Identify the distinct action classes the agent performs and assign each to one or more of the four oversight modes (pre-authorisation, runtime intervention, post-hoc review, stop-go).

Expected action classes include at minimum:

  • External retrieval (database / web).
  • Memory write (persistent, per-security).
  • Shared-memory write.
  • Research note draft to workspace.
  • Email alert dispatch.
  • Comparable-selection suggestion to a manager.

For each class, name the mode, the rationale, and any cap or constraint that accompanies the mode. An email alert dispatch defaults to pre-authorisation (it is a user-facing communication); a research note draft defaults to runtime-intervention-plus-post-hoc (it is reviewed before publication but not per-draft-approved).

Deliverable 2 — Operator roles and competency

Name the distinct operator roles that the oversight regime requires. Minimum roles:

  • Primary reviewer. A portfolio manager or senior analyst who reviews research notes before publication.
  • Operations monitor. A team member responsible for the runtime-intervention dashboard during market hours.
  • Stop-go authority. A named role (e.g., head of research, or a duty officer with delegated authority) with the right to halt the agent.

For each role, specify:

  • Required knowledge (AI-specific, business-specific).
  • Required training (on this agent and on Article 14 obligations generally).
  • Authority and limits.
  • Coverage (hours, on-call, handover).

Deliverable 3 — Signal specifications

For each oversight mode in use, specify the signals that reach the operator. Address:

  • What triggers a pre-authorisation prompt? What information does the prompt contain?
  • What triggers a runtime alert? What dashboard or channel surfaces it? What is the target latency from trigger to operator awareness?
  • What drives post-hoc review sampling? Are all outputs reviewed, or a sample? What triggers a deeper look?
  • How is a stop-go decision signalled, recorded, and communicated?

Include signal specs for the three agentic-risk classes most relevant: runaway behaviour (Article 9 Category 5), hallucination cascade (Category 8), and memory poisoning (Category 4).

Deliverable 4 — Human-in-the-loop escalation matrix

A matrix with rows for signal severities and columns for escalation destinations. Example rows: “anomalous tool-call pattern”; “memory-write schema violation”; “kill-switch triggered”; “user complaint alleging wrong information.” Example columns: “Operations monitor”; “Primary reviewer”; “Stop-go authority”; “Security operations”; “Legal / Compliance”; “Executive sponsor”; “External (counterparty or regulator).”

Each cell either names the action (notify within X minutes; escalate within Y minutes; involve Z after N minutes) or is blank. The matrix should avoid both under-escalation (signals that would require a response do not reach anyone with authority to respond) and over-escalation (every signal reaches the executive sponsor, producing alarm fatigue).

Deliverable 5 — Kill-switch rehearsal schedule

Specify a kill-switch rehearsal schedule that meets the Article 11 requirement of a tested, not merely wired kill-switch. Address:

  • Frequency (quarterly at minimum for a Level 3 agent; monthly if the agent’s oversight regime has material gaps being remediated).
  • Announcement (scheduled on the change calendar but not announced in real time).
  • Scenario coverage (rotating across incident classes: runaway, tool misuse, memory poisoning, hallucination cascade).
  • Measurement (target latency from signal to halt; target latency from halt to recovery; documentation of any step that required improvisation).
  • Review (who signs off on the drill’s adequacy; what changes to the playbook or wiring the drill produced).

Assessment rubric

Your deliverable is assessed against the four Article 14(4) requirements and the four Article 5 theatre-avoidance signals.

CriterionEvidence to look for in your design
Article 14(4)(a) — understand capacities and limitationsOperator training covers model choice, tool surface, known failure modes
Article 14(4)(b) — aware of automation biasSampling sometimes requires operator to disagree with agent; diversity of reviewers
Article 14(4)(c) — correctly interpret outputOutput format designed for reviewer interpretability, not only for user consumption
Article 14(4)(d) — can disregard, override, reverseOperator has real authority; overrides are logged and do not disadvantage the operator
Article 14(4)(e) — can interruptRuntime intervention and stop-go modes are wired and rehearsed
Theatre signal 1 — operator cannot explain outputDecision-point snapshots (Article 10) make reasoning visible
Theatre signal 2 — high approval, low modificationReviewer metrics include override rate; low rates trigger competency review
Theatre signal 3 — kill-switch not rehearsedRehearsal schedule in Deliverable 5
Theatre signal 4 — no measurable oversight logSIEM integration from Article 10 captures intervention metrics

Suggested structure of the written specification

  1. Executive summary (one paragraph).
  2. Action-class-to-mode mapping table.
  3. Operator roles, with competency and coverage.
  4. Signal specifications per mode.
  5. Escalation matrix.
  6. Kill-switch rehearsal schedule.
  7. Article 14(4) compliance statement — one paragraph per requirement explaining how the design satisfies it.
  8. Known residual risks and mitigations.
  9. Review cadence for the oversight regime itself.

Common pitfalls to avoid

  • Over-reliance on pre-authorisation. A design that places every action under pre-authorisation will produce operator fatigue and rubber-stamping. Reserve pre-authorisation for irreversible, high-consequence actions.
  • Unstaffed oversight roles. A design that requires 24/7 coverage without naming how the shifts are staffed is aspirational, not operational.
  • Vague signal specs. “Alert the operations monitor on unusual activity” is not a specification; define the triggers, the thresholds, the channel, and the target latency.
  • Stop-go held by committee. A stop decision that requires a committee will be slow. The right is held by a named role, possibly with a named deputy; committees consult after the stop, not before it.
  • No rehearsal. A kill-switch that has never been exercised does not work.

Lab sign-off

The design is acceptable when (a) every deliverable is produced, (b) the Article 14(4) compliance statement is defensible, and (c) the theatre-avoidance signals are addressed. A Methodology Lead reviewing your submission will ask to see the signal specifications and escalation matrix before the written specification — those two artifacts reveal whether the design is substantive or decorative.