Sandboxing and Execution Isolation for Agents

FlowRidge

The architect’s question is not “should we sandbox the agent” — the answer is always yes for any agent that executes arbitrary or semi-arbitrary code — but “how deep does the sandbox need to be.” That decision hinges on the tool surface, the threat model, and the cost of escape. A code-execution agent running a language-server completion has a different sandbox requirement from a code-execution agent writing production SQL against a tenant database. This article teaches the four dominant isolation patterns, the selection criteria for each, and the specific patterns for code-execution agents where the sandbox is the primary safety boundary.

Why sandboxing is an agentic-specific problem

Classical application sandboxing — Docker, chroot, seccomp — was designed for code a developer wrote and reviewed. Agentic sandboxing must contain code the agent writes during inference, which means the content of the sandboxed action is adversarial-by-possibility even when the agent is benign-by-design. Indirect prompt injection via retrieved documents (see Article 14) turns a helpful summarization agent into an attacker-controlled shell if the sandbox leaks. Memory poisoning (Article 7) can cause tomorrow’s agent session to run code injected into yesterday’s memory.

Three properties distinguish agentic sandboxing from classical application sandboxing. First, the payload is generated at runtime from natural-language input, so static analysis has limited value. Second, the sandbox must support dynamic tool registration — new tools appear across the agent’s lifecycle, and the sandbox policy must tighten automatically rather than require manual review for every tool. Third, observability must capture the attempted action even when the action fails, because a blocked attempt is itself an incident signal.

Four isolation patterns

Pattern 1 — Containers with hardened profiles

The default pattern for most agentic tool execution. A Docker or Podman container with a minimal base image, seccomp profile restricting allowed syscalls, AppArmor or SELinux policy, dropped Linux capabilities (--cap-drop=ALL), read-only root filesystem, network namespace with explicit egress allowlist, and resource quotas (cpu, memory, pids, disk IO) enforced by cgroups v2. Container startup latency is typically 100–500ms, low enough for interactive agent loops.

The limitation is shared-kernel attack surface. Container escape CVEs appear roughly annually (e.g., CVE-2022-0185, CVE-2024-0132), and a compromised container can — in theory — reach the host. For most enterprise agentic workloads, a hardened container with the defenses listed above is sufficient. For code-execution agents that run arbitrary model-generated code, the shared-kernel risk is usually too high.

Pattern 2 — MicroVMs

MicroVMs (Firecracker from AWS, Cloud Hypervisor from Intel, QEMU in microVM mode) give each tool invocation its own kernel. Startup latency is 125ms to ~1s, still acceptable for most agent loops. The KVM boundary is a far stronger isolation primitive than a shared-kernel container: a compromised guest must escape hypervisor emulation to reach the host, and the CVE rate on that surface is roughly an order of magnitude lower than on containers.

MicroVMs are the sandbox pattern used by E2B (the public code-interpreter sandbox), Modal, and most commercial code-execution services. They are the right default for code-execution agents, Python notebooks executed by agents, shell agents, and browser-use agents that run arbitrary JavaScript.

Pattern 3 — WebAssembly runtimes

WebAssembly (Wasmtime, WasmEdge, Wasmer) provides a capability-based isolation model with WASI providing the syscall surface. Startup latency is 1–10ms — an order of magnitude faster than containers. The security model is strong: WASM programs cannot make syscalls the embedder has not granted, cannot read the filesystem except through WASI capability handles, and cannot open network sockets without explicit grants.

The trade-off is ecosystem. Most agent tools are written for POSIX, not WASI. Running Python in WASM (via Pyodide or CPython WASM) is possible but lags upstream. WebAssembly shines when the tool surface is controlled — a custom-calculator tool, a domain-specific policy evaluation, a pure-function rule engine. It is premature for arbitrary code execution at the time of writing.

Pattern 4 — Restricted language runtimes

The lightest-weight pattern: run the agent-generated code inside a language interpreter with the interpreter’s own safety features. RestrictedPython for Python, VM2 (now deprecated due to escape CVEs) and isolated-vm for JavaScript, Ruby’s SAFE levels (deprecated), Lua’s sandboxed mode. The pattern exists because it is fast — no process boundary — and because for simple expression evaluation it is sufficient.

The pattern is also repeatedly proven insufficient for adversarial code. VM2’s repeated escapes, RestrictedPython’s audit history, and the general fragility of “same-process sandboxing” mean this pattern should be used only for pure expression evaluation with schema-validated input, not for code-execution agents. When in doubt, use a microVM.

Selection criteria

The architect’s selection is driven by four factors: the class of code the agent will run, the blast radius of escape, the latency budget of the agent loop, and the operational complexity the platform team can absorb.

Class of code. Arbitrary Python or shell → microVM. Domain-specific expression evaluation → WebAssembly or restricted runtime. Network-only tool calls (HTTP API invocation) → hardened container is sufficient because no local code runs. Browser-use agents → microVM with a browser profile, because the browser itself is the attack surface.

Blast radius. An agent that writes files visible to other tenants, reaches internal services, or persists state that other sessions will read needs the strongest isolation. An agent that runs a pure function and returns a scalar needs less.

Latency budget. Interactive chat agents with sub-second response targets cannot afford 1s microVM cold-starts unless the platform pre-warms a microVM pool. Batch agents can absorb multi-second startup without user-visible impact.

Operational complexity. Running Firecracker in production requires KVM access, which is not available in all managed-Kubernetes environments. Teams without that capability should use a commercial sandbox service (E2B, Modal) rather than try to retrofit microVMs onto unsupported infrastructure.

Code-execution agent specific patterns

Code-execution agents — agents that write code and then run it — are the hardest sandboxing case and therefore deserve their own pattern set. The reference architecture for a code-execution agent involves five layers.

The first layer is the microVM, provisioned per session or per tool call, with CPU, memory, and wall-time budgets enforced at the hypervisor level. The second is a read-only base image with only the language runtime and a vetted package set; no shell, no package manager, no network tools. The third is an explicit network policy: egress denied by default, with the agent permitted to reach only the retrieval store and a whitelisted set of APIs named in the tool registry. The fourth is a capability manifest per tool that declares which filesystem paths the tool may read or write; the sandbox driver enforces this at mount time with a tmpfs overlay that evaporates on session end. The fifth is full observability: every executed code block is captured with input, output, resource consumption, and exit code, shipped to the agent trace store (Article 15) and retained for incident replay.

The commercial pattern — E2B for generic code execution, Modal for GPU-heavy workloads, AWS CodeBuild/Lambda for tightly-integrated AWS shops — collapses most of this configuration into a managed sandbox product. The build-your-own pattern — Firecracker + containerd + OCI-image pipeline + custom orchestration — is well-documented (the AWS Lambda Firecracker reference architecture is public) and is the right choice for organizations with strong infrastructure teams who need deep customization or have regulatory reasons to avoid multi-tenant commercial sandboxes.

Network egress controls

A recurring anti-pattern in agentic sandboxing is strong process isolation with weak network egress controls. An agent whose code sandbox is otherwise solid but can reach 169.254.169.254 (the cloud metadata endpoint) can exfiltrate IAM credentials in one line of code. The Capital One breach in 2019 was an AWS-metadata-exfiltration incident on a non-agentic system, and the agentic version of that exposure is a one-prompt-injection away for any agent without network egress enforcement.

Minimum controls: deny all egress by default, allowlist specific hostnames/IPs, block cloud metadata endpoints explicitly (169.254.169.254, fd00:ec2::254, the equivalents on Azure and GCP), route egress through an inspection proxy that runs DLP rules on outbound payloads. For agents that need to reach the public internet (browser-use agents), the egress proxy must also prevent the agent from reaching internal-network RFC1918 ranges.

Multi-tenant sandboxing

A common platform pattern is to run many tenants’ agents on a shared sandbox pool. Multi-tenant sandboxing adds one more constraint: a breakout from one tenant’s sandbox must not reach another tenant’s sandbox. MicroVMs provide this separation cleanly because each VM has its own kernel; shared-kernel containers do not and require additional controls (per-tenant cgroup namespace hierarchies, seccomp profiles that block common container-escape syscalls, and regular hypervisor/host patching).

The architect should document which tenant-isolation primitive the sandbox provides and the residual cross-tenant blast radius. Regulated tenants (banks, health payers, governments) often have contractual requirements that their workloads not share a kernel with other tenants. The microVM pattern or a per-tenant physical pool satisfies this; container-only patterns typically do not.

EU AI Act Article 15 implications

EU AI Act Article 15 (robustness, cybersecurity, resilience) requires high-risk AI systems to be resilient to errors and to “unauthorized third parties attempting to alter their use or outputs.” An agent without sandboxing whose code execution can be redirected by indirect prompt injection is failing Article 15’s cybersecurity obligation, and the conformity-assessment documentation should show sandbox design, penetration-test evidence, and the network-egress policy. See Article 23 for the full regulatory mapping.

Learning outcomes

Explain four isolation patterns — hardened containers, microVMs, WebAssembly runtimes, restricted language runtimes — and the trade-offs each offers on containment, latency, and ecosystem.
Classify five tool surfaces by required isolation pattern using the class-of-code, blast-radius, latency-budget, and operational-complexity factors.
Evaluate a proposed isolation design for adequate defense against known container-escape techniques, network-egress exfiltration, and multi-tenant cross-contamination.
Design a sandbox specification for a code-execution agent, including the microVM configuration, base image, capability manifest, egress policy, and observability plumbing.