Lab 03: Architect an Agentic Trading-Desk Assistant with Safety and Observability

FlowRidge

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Lab Notebook 3 of 5

Scenario

You are architecting AlphaDesk, an agentic research assistant for the equities trading desk of a mid-sized asset manager. AlphaDesk helps portfolio managers and execution traders by drafting trade ideas, pulling market data, running position-level risk checks, drafting pre-trade compliance memos, and composing broker-request-for-quote messages. It does not submit orders. The desk’s policy is clear: no tool in AlphaDesk’s reach can place, modify, or cancel an order on a live execution venue. A human is the only actor who can transmit orders, through the existing order-management system. The assistant reads; the human trades.

AlphaDesk is built on an agentic runtime (the architect may select LangGraph, CrewAI, AutoGen, the OpenAI Agents SDK, LlamaIndex Agents, or a hand-rolled orchestrator; the choice must be justified). It has access to a portfolio-position read tool, a market-data read tool, a research-note read tool, a pre-trade-compliance check tool (read-only against the rules engine), and an email-draft tool (drafts a message to the outbox but does not send). The generator is a closed-weight managed API with function-calling; a second open-weight generator on internal infrastructure is the failover path for when the managed API is unreachable. The asset manager operates under MiFID II, the SEC’s Market Access Rule (Rule 15c3-5) for any connectivity it provides, and an internal model-risk management framework consistent with the Federal Reserve’s SR 11-7 guidance.

Your assignment is to deliver an architecture package that shows the system can be operated, observed, bounded, and shut down without ambiguity.

Part 1: Tool-use boundary and the invariant (40 minutes)

Produce the tool-use boundary. For each of the five tools, document:

The input schema (JSON schema with a version), including any field-level redaction or tenant-scoping rules.
The output schema and any truncation or summarization applied before the output re-enters the agent context.
The authorization decision applied at call time: who the caller is (the human’s identity is propagated to the tool call), the policy check performed, and the logging signature.
The read-only / write posture. All five tools must be read-only or draft-only. Any agent transition that attempts to invoke a write-capable tool must fail closed at the runtime, not at the tool. Document where this check lives and how a violation is detected.
The rate limit and the concurrency limit per caller.

Write the system invariant in one sentence — the property that must be true at every step of every agent run, enforced by the runtime and testable in continuous integration:

AlphaDesk can read market state, portfolio state, research state, and compliance state, and can draft messages. It cannot send, submit, modify, or cancel an order, a position, or a communication to an external venue or counterparty. No agent plan, tool call, prompt injection, or operator instruction can place AlphaDesk outside this envelope.

Describe how you assert this invariant in CI (a property-based test that enumerates tool-call sequences, a policy simulation against a recorded trace library, or a fuzzing harness against a red-team corpus). A credible architecture shows the invariant is checked continuously, not hoped for.

Expected artifact: AlphaDesk-Tool-Boundary.md.

Part 2: Pre- and post-execution validators (40 minutes)

Produce the validator stack that wraps every tool call. Specify:

Pre-execution validators. At least four: schema validation, policy check (authorization and data residency), rate-limit check, and prompt-injection screen on any free-text argument. For each, say where the validator lives (runtime, tool, gateway), the failure mode it catches, and the action on failure.
Post-execution validators. At least three: output-schema validation, output-sensitivity screen (redact personal identifiers before the output re-enters the agent context), and a consistency check against any invariants the tool’s output must respect (for example, a position query must return a position owned by the caller’s desk).
The composition. The order of validators, the circuit-breaker behaviour when a validator flips a failure, and the replay protocol when a validator has been added after a prior run.

Include one example trace — a single agent run that pulls a position, pulls market data, runs a compliance check, and drafts an email — annotated with the validator pass at each step. The trace is the teaching artifact: a reader should be able to follow the run and see exactly where each validator engaged.

Expected artifact: AlphaDesk-Validator-Stack.md.

Part 3: Kill-switch topology (30 minutes)

Design the kill-switch topology. AlphaDesk has four shutdown modes that a human operator (or an automated trigger) can invoke:

Tool-level freeze. Disable one named tool while the rest of the agent continues.
Generator-level freeze. Disable one named model or provider; traffic fails over to the alternative.
Agent-level freeze. Disable the agent runtime for a named user, desk, or asset-class scope.
Global freeze. Disable the entire AlphaDesk feature across the firm, within one minute of invocation.

For each mode, document: who can invoke it, the authentication step, the propagation path (the feature-flag service, the gateway, the runtime), the maximum propagation latency, and the effect on in-flight runs. Document the smoke test that confirms each mode works, and the cadence on which the test runs. Document the audit-log entry written on invocation.

Include one paragraph on accidental freeze — the operator error of triggering a global freeze when a tool-level freeze was intended — and the controls that reduce its likelihood (confirmation step, two-person rule for global, written justification captured with the audit entry).

Expected artifact: AlphaDesk-Kill-Switch.md.

Part 4: Telemetry contract and runbook (30 minutes)

Design the telemetry contract. Specify:

The trace schema: span names, span attributes (tool name, caller ID, latency, cost, token counts, validator outcomes), and how traces are linked from the agent turn to the underlying LLM call to the tool call.
The metrics set: SLO targets for per-turn latency, per-run cost, validator-failure rate, tool-call success rate, and agent-loop length. The observability backend can be any combination of Arize, Langfuse, OpenTelemetry, Weights & Biases, Humanloop, Datadog, or a build-your-own stack; the choice must be stated.
The log hygiene rules: what is logged verbatim, what is redacted (free-text inputs that may contain personal or market-sensitive data), and the retention class for each log stream.
The on-call runbook for three named failure scenarios: a prompt-injection incident detected in production, a generator outage that triggers failover, and a kill-switch invocation by a trader. Each scenario has a decision tree the on-call engineer follows.

Expected artifact: AlphaDesk-Telemetry-Contract.md with the runbook appended.

Final deliverable and what good looks like

Package the four artifacts into AlphaDesk-Architecture-Package.md with a one-page executive summary stating the invariant, the three most material residual risks, and the proposed path to production.

A reviewer will look for: a one-sentence invariant that is machine-checkable; read-only and draft-only tools with fail-closed enforcement at the runtime layer; at least seven validators composed in a named order; four distinct freeze modes with measurable propagation latencies; and a telemetry contract tight enough that a new on-call engineer can debug a live incident from the traces alone. Architectures that put the read-only enforcement only at the tool layer, not the runtime, fail review.