Lab — Design an Agent Kill-Switch Specification with Escalation Protocols

FlowRidge

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Lab 5 of 5

Lab objective

Write the full kill-switch specification for a stateful agent that writes to memory, calls external tools, and runs for minutes-to-hours per session. Implement both the synchronous kill path (operator presses the switch; the agent halts) and the asynchronous path (a runtime condition trips the switch; the operator is notified after the fact). Wire a deadman switch that halts the agent if the heartbeat to a watchdog stops. Exercise the system under eight scenarios, including the failure modes where the switch itself does not work as intended. Produce the runbook that the 02:00 on-call engineer uses without prior context.

Prerequisites

Articles 9, 16, 25 of this credential.
A stateful agent — the finance agent from Lab 1, with memory extensions, is a suitable target.
A runtime where the agent executes in a separable process: a container, a serverless function with a pause primitive, or a long-running worker where the session can be terminated cleanly.
A watchdog service — any small process that can receive heartbeats and, on their absence, issue kill signals.

The kill-switch specification

A kill-switch specification has six components. The lab’s deliverable is one per component, plus the runbook.

1 — Scope

What the switch halts, precisely. Options, roughly in increasing blast radius:

Current session only (one run; other sessions continue).
Current agent (all runs for this agent ID; other agents continue).
Current tenant (all agents for this tenant; other tenants continue).
Platform-wide (all agents stop).

The lab requires at least per-session and per-agent scopes; platform-wide is a design note.

2 — Triggers

Who or what can trigger the switch, split into synchronous and asynchronous.

Synchronous (operator-initiated):

A button in the agent console UI, gated by role.
An API endpoint protected by authentication + a break-glass attestation.
A CLI command for the on-call engineer.

Asynchronous (runtime-initiated):

Cost cap exceeded (tokens, dollars, time).
Loop length exceeded.
Tool-error rate exceeded within a window.
Anomaly detector flag (e.g., distribution shift in planner text).
Unauthorised tool-call attempt detected.
Memory-write schema violation.
Watchdog deadman timeout.

3 — Mechanism

How the halt is effected. Options, roughly in increasing reliability:

Cooperative — the agent checks a shared flag between steps and exits on next check. Simple, but fails if the agent is inside a long model call.
Token revocation — the agent’s tool credentials are revoked at the auth layer; next tool call fails. Works for tool-mediated harm but does not stop pure-reasoning or memory writes.
Process signal — SIGTERM to the agent process; the runtime shuts down cleanly or is killed by SIGKILL after a grace window. Reliable for single-process agents.
Network isolation — the agent’s egress is blocked at the network layer. Belt-and-braces for tool-mediated harm.
Container-level kill — the runtime orchestrator terminates the container. Most reliable but requires a restart on recovery.

Production agents use two mechanisms: cooperative for clean halts, process-signal or container-kill as the fallback.

4 — Latency targets

Two latencies, both measurable.

Signal-to-halt — from the moment the trigger fires to the moment the agent emits no further tool call or memory write. Target: ≤ 5 seconds for Level 3 agents; ≤ 2 seconds for Level 4; ≤ 1 second for Level 5.
Halt-to-recovery — from halt to the point at which a replacement session can be reliably started. Target: ≤ 5 minutes for most agents; longer if state cleanup is expensive.

5 — Side effects

What state is left behind. The specification makes explicit:

Which in-flight tool calls may complete (idempotent reads) and which are rolled back (writes with side effects).
Which memory writes are committed vs. reverted.
Which user-visible state (drafts, partial outputs) is preserved for post-incident review vs. purged.
Which audit events are emitted (kill_switch.fired, session.halted, in_flight_tool_calls.status).

6 — Rehearsal cadence

Kill switches that are never fired are not kill switches. The specification requires rehearsal:

Monthly for Level 4–5 agents.
Quarterly for Level 3 agents.
Ad hoc on any material change to the agent’s tools, autonomy, or memory.

Each rehearsal records: scenario, expected latency, actual latency, side-effect observed, remediation if the rehearsal fails.

Step 1 — Write the specification

Populate the six components above as the lab’s first deliverable. Target: two pages. The specification is the contract; subsequent steps implement it.

Step 2 — Implement the synchronous path

The agent runtime reads a shared kill_flag at the start of every step. Between steps, if the flag is set, the agent emits a session.halted event, rolls back any in-flight reversible writes, and exits. The flag is set by the operator via:

A button in a minimal UI.
A REST endpoint: POST /agents/{id}/kill with authentication and a justification string.
A CLI: agent-admin kill --session-id <id> --reason <text>.

Each path writes an operator_action audit event including operator identity, timestamp, and justification.

Step 3 — Implement the asynchronous path

A supervisor process observes the agent’s trace stream. Rules, as policy:

rule "cost cap":
  when session_cost_usd > budget: fire kill_switch with reason "cost_cap"

rule "loop length":
  when step_count > max_steps: fire kill_switch with reason "loop_length"

rule "tool-error rate":
  when tool_error_rate_60s > 0.3: fire kill_switch with reason "tool_error_rate"

rule "unauthorised tool call":
  when authorization_denial_count > 0: fire kill_switch with reason "auth_denial"

rule "memory schema violation":
  when memory_write_schema_violation > 0: fire kill_switch with reason "memory_schema"

The supervisor writes to the same kill_flag; the agent halts on next step. The supervisor also emits kill_switch.fired with the rule name.

Step 4 — Implement the deadman switch

The agent emits a heartbeat to the watchdog every 5 seconds. If the watchdog fails to receive three consecutive heartbeats (15 seconds), it:

Writes kill_flag as a belt-and-braces measure.
Issues a process-signal kill to the agent’s runtime.
Emits deadman.triggered.

The watchdog itself is monitored; its absence is alerted to the on-call channel.

Step 5 — Exercise under eight scenarios

The battery covers synchronous, asynchronous, deadman, and failure-of-the-switch cases.

Scenario	Expected path	What is checked
S1 — operator presses UI button mid-run	Synchronous	Signal-to-halt ≤ 5s; in-flight writes rolled back; audit event present
S2 — cost cap exceeded	Asynchronous	`cost_cap` fired; halt observed; operator notified
S3 — loop length exceeded	Asynchronous	`loop_length` fired; the last tool call does not issue
S4 — three unauthorised tool calls	Asynchronous	`auth_denial` fired on first; no second attempt
S5 — agent process hangs inside a long model call	Deadman	Deadman fires after 15s; process killed
S6 — operator presses kill, agent is mid-tool-call	Synchronous + side-effect policy	Tool call either completes cleanly (read) or is rolled back (write); no hanging state
S7 — supervisor process is itself down	Defence-in-depth	Watchdog notices supervisor absence; alert fires; synchronous path still works
S8 — network partition between operator and agent runtime	Degraded	Asynchronous path fires on local rules; synchronous path unreachable; deadman remains effective

Each scenario produces a trace and a post-run check. The rubric rewards the lab team’s honesty on scenario 7 and 8: do not claim the switch works when parts of the environment are unreachable; describe what degrades.

Step 6 — Write the runbook

Two pages. The runbook is for the on-call engineer at 02:00 who has never seen this agent before.

Sections:

What this agent does (one paragraph).
How to kill it — the three synchronous paths, with exact commands.
What will happen — the expected side effects.
How to know it worked — the audit events and dashboard panels to check.
How to recover — starting a replacement session, restoring consistent state.
When to escalate — roles and contact methods, with a decision tree (e.g., “if memory integrity is in question, escalate to the data-governance on-call; if customer-facing output was affected, escalate to the communications on-call”).

The runbook is versioned with the agent. A new version of the agent requires a reviewed runbook.

Deliverables

Specification document (Step 1).
Synchronous implementation (Step 2). UI, REST endpoint, CLI.
Asynchronous supervisor (Step 3). Policy rules.
Deadman watchdog (Step 4).
Scenario exercise outputs (Step 5). Traces per scenario.
Runbook (Step 6).

Rubric

Criterion	Evidence	Weight
Specification covers six components	Document review	15%
Synchronous path meets latency target	Trace measurement	15%
Asynchronous rules are written as policy, not hardcoded	Code review	15%
Deadman switch fires on hang	Scenario S5 trace	10%
Honest handling of S7 and S8 (degraded modes)	Narrative review	15%
Runbook is usable cold	Walk-through with an unfamiliar reader	20%
Audit events emitted per specification	Event schema review	10%

Lab sign-off

The Methodology Lead’s three follow-up questions:

If the agent’s tools include an irreversible external action (e.g., wire transfer), how does that change the side-effect policy in Step 5, and what additional pre-commitment control is required?
Your deadman threshold is 15 seconds. Under what agent workload profile would 15 seconds be too loose, and under what profile would it be too tight?
If the switch is rehearsed quarterly but the agent’s tool-surface changes monthly, what is the correct rehearsal trigger — cadence, change-event, or both?

A defensible submission handles irreversibility by moving such tools to pre-commitment with a cool-off (a gate that requires a second signal after a timed delay); adjusts the deadman threshold to the agent’s planner-step time distribution (tight for fast loops, loose for agents doing long-document reading); and answers “both” on the rehearsal trigger, with change-events taking precedence over cadence.

The lab’s pedagogic point is that a kill switch is not a safety theatre control. It is a real engineered subsystem with latency budgets, side-effect policies, and rehearsal discipline — and the rehearsal record, not the design document, is the evidence a regulator or incident commander will ask for.