Agent Sandboxing and Secure Code Execution: Matching Isolation Depth to Risk
Most teams shipping LLM agents with code execution capabilities make the same miscalculation: they treat sandboxing as a binary property. Either they skip isolation entirely ("we trust our users") or they deploy Docker containers and consider the problem solved. Neither position survives contact with production.
The reality is that sandboxing exists on a spectrum with five distinct levels, each offering a different isolation guarantee, performance profile, and operational cost. The mismatch between chosen isolation level and actual risk profile is the root cause of most agent security incidents — not the absence of any sandbox at all.
This post walks through the sandbox spectrum, the four dimensions of capability restriction that every sandbox must address, the escape vectors that practitioners consistently overlook, and the decision framework for matching sandbox depth to what your agent is actually doing.
The Isolation Spectrum
Think of agent sandboxing as a dial, not a switch.
Level 0: No sandbox. Direct exec() or subprocess calls on the host OS. Zero overhead, zero isolation. LLM-generated code runs with the full privileges of the process. This is appropriate only for developer-tooling where you're executing your own static scripts in an offline environment — the category where almost no production agent lives.
Level 1: Container isolation (Docker/LXC). Linux namespaces (pid, net, mnt, uts, ipc) plus cgroups for resource limits. Startup in milliseconds, near-zero memory overhead. The critical weakness is the shared kernel: all containers on a host run on the same Linux kernel, meaning a single unpatched kernel CVE simultaneously compromises every container on the host. Container isolation is not VM isolation. Use it for trusted code in single-tenant, internal environments.
Level 2: seccomp-BPF + hardened policies. A seccomp-bpf filter attached at process startup restricts which system calls the process can make. Default Docker profiles block around 44 syscalls; hardened AI agent profiles should go further — blocking ptrace, mount, unshare, clone with CLONE_NEWUSER, keyctl, perf_event_open, and bpf. Pair this with --cap-drop=ALL, --security-opt no-new-privileges, and AppArmor/SELinux mandatory access control. The overhead is negligible (filter evaluation happens in nanoseconds), but you're still on a shared kernel — seccomp reduces the attack surface, it doesn't eliminate it.
Level 3: gVisor (user-space kernel). Google's open-source project reimplements the Linux syscall surface in a Go process called Sentry. Guest syscalls intercept before reaching the host kernel and are handled in user space. To escape, an attacker must first break out of Sentry and then defeat a hardened seccomp profile protecting the Sentry process itself — two independent layers. Performance overhead is near-zero on compute-bound tasks and 10–30% on I/O-heavy workloads. Startup is milliseconds. Enabling it requires a single daemon.json change to use the runsc runtime. GPU support was added in 2024/2025. gVisor is the right choice for multi-tenant SaaS workloads running on Kubernetes where nested virtualization is unavailable.
Level 4: MicroVMs (Firecracker). Each sandbox gets its own dedicated Linux kernel. AWS built Firecracker in ~50,000 lines of Rust using KVM. Cold boot takes under 125 ms. Memory overhead is under 5 MiB per VM. Compute overhead is less than 5% compared to bare metal. The attack surface reduction is dramatic: Firecracker exposes only four virtual devices (virtio-block, virtio-net, serial, keyboard) versus the hundreds QEMU exposes. The Jailer companion drops privileges and applies cgroups as a second-line defense. With snapshot/restore, sandbox provisioning drops to around 28 ms by restoring a pre-warmed memory image via copy-on-write overlay. This is what AWS Lambda runs — at tens of trillions of invocations per month — and what managed platforms like e2b use for AI code execution. For multi-tenant deployments handling user-supplied or LLM-generated code, microVMs are the current standard.
Level 5: WebAssembly (Wasm/WASI). Code compiled to WebAssembly runs in a capability-based sandbox with no filesystem, network, or OS access by default. Every import must be explicitly granted by the host. Sub-millisecond startup, near-bare-metal compute performance. The limitation: not all languages compile cleanly to Wasm. Python requires Pyodide (CPython compiled to Wasm), which works but has a significant cold load. Wasm is the right choice for plugin sandboxes, browser-side code execution, and agent tools running in edge environments. It does not replace microVM isolation for general code execution because Spectre-class transient-execution attacks can potentially break Wasm memory isolation — a known limitation that matters at higher security tiers.
The Four Dimensions of Capability Restriction
Choosing an isolation level is only half the problem. Within whatever sandbox you deploy, you need to explicitly restrict four capability dimensions.
Filesystem access. The sandbox should have a read-only root filesystem with write access only to a scoped workspace directory and /tmp. Critically, write access to configuration files outside the workspace must be blocked: ~/.gitconfig, ~/.zshrc, ~/.local/bin, .cursorrules, MCP configuration files, IDE config directories, and hook scripts. These are the persistence vectors — an agent that can write to these files can affect future sandbox sessions even if the current session is destroyed.
Network access. Default-deny egress. Allowlist specific API endpoints if the agent legitimately needs external access. Block DNS queries to arbitrary resolvers: DNS-based data exfiltration (encoding sensitive data into query hostnames) bypasses IP/port-based egress filters entirely and is underutilized by defenders. Internal communication between the agent and sandbox should use vsock (kernel-to-kernel) rather than TCP to avoid exposing a network interface.
Syscall surface. Beyond the seccomp defaults, every sandbox should explicitly block syscalls that enable privilege escalation: ptrace (process inspection), mount (filesystem manipulation), unshare with new user namespaces, keyctl, and bpf. Drop all Linux capabilities with --cap-drop=ALL and add back only what the specific workload requires.
Process spawning. Restrict how many processes the sandbox can create using cgroups pids.max. This limits fork bomb attacks and resource exhaustion. More importantly, be careful with any --allow-run equivalent: allowing the sandbox to spawn arbitrary subprocesses largely defeats capability restrictions because subprocesses inherit OS-level access regardless of the parent's permission flags.
Escape Vectors Practitioners Miss
The sandbox spectrum addresses containment. Practitioners consistently miss a separate category of escape vectors that exist regardless of sandbox level.
The Docker socket mount. Mounting /var/run/docker.sock inside a container grants complete Docker API access, which is equivalent to host root. This is remarkably common in development configurations that drift into production. Audit for it explicitly.
Symlink traversal in path-prefix filtering. Multiple high-severity CVEs in 2025 demonstrated that simple path prefix checks (if path.startswith("/workspace")) are bypassable via symlinks. The filesystem server in a widely-deployed MCP implementation had this exact bug: creating a symlink from /workspace/escape to /etc/passwd turned a read-within-workspace call into an arbitrary host file read. Use real path resolution (os.path.realpath()) before any prefix check.
Configuration file poisoning. An agent instructed via prompt injection to write to ~/.gitconfig or an IDE hook script can cause arbitrary code execution the next time that user opens their editor — outside any sandbox. This is a supply-chain attack that sandboxes do not prevent because it exploits the gap between the sandbox's filesystem boundary and the user's home directory. The mitigation is a read-only filesystem mount for all paths outside the workspace, not just blocking writes to specific filenames.
The --allow-run escape. In Deno's permission model and analogous systems, allowing subprocess execution voids capability restrictions. Subprocesses inherit OS-level access. If your agent tool invocation pattern requires exec-ing arbitrary binaries, you are back to Level 0 regardless of the surrounding sandbox.
Kernel CVE propagation in container-only deployments. CVE-2024-1086 (a netfilter use-after-free) was actively exploited through at least October 2025 in ransomware campaigns. All containers on an unpatched host are simultaneously vulnerable, because they share the kernel. If you're running multi-tenant agent workloads in containers on shared hosts, a single unpatched kernel CVE is a blast radius that affects all tenants. Keep kernel versions current and consider gVisor or microVMs for workloads where kernel CVEs are a meaningful threat model.
Prompt injection bypassing the sandbox through conversation. Research from Washington University found that 63.4% of LLM agents without proper isolation leaked sensitive data through conversation — not through code execution. A helpful agent will summarize files it has read if an injected instruction asks it to, regardless of what the code execution sandbox allows. Technical sandboxing does not address the semantic layer. The mitigation is architectural: run agent tools that touch sensitive data in isolated contexts without sharing that context with external input sources.
Matching Sandbox Depth to Risk Profile
Three questions determine the appropriate isolation level:
Is the code generated at runtime from LLM output? If yes, containers alone are insufficient for any multi-tenant deployment. LLM-generated code has an unpredictable syscall and filesystem access pattern that cannot be characterized in advance. Use gVisor at minimum; Firecracker for production multi-tenant workloads.
Are multiple tenants sharing infrastructure? If yes, tenant isolation requires dedicated kernel boundaries. Use Firecracker microVMs or Kata Containers (which wraps microVMs behind standard Kubernetes APIs). The 200 ms Kata container startup overhead is the price of VM-level isolation with no changes to existing Kubernetes workflows.
Does the workload require GPU access? If yes, Firecracker is currently ruled out — it does not support PCIe/GPU passthrough. Use gVisor (GPU support was added in 2024/2025 through a Groq partnership) or full VMs. This is a significant architectural constraint for AI workloads where agents need GPU-accelerated inference.
The resulting decision matrix:
| Scenario | Recommended Approach |
|---|---|
| Internal tooling, trusted devs, single-tenant | Docker + hardened seccomp profile |
| LLM code, single-tenant SaaS | gVisor (runsc runtime) or Docker + --security-opt no-new-privileges |
| Multi-tenant LLM code execution | Firecracker microVM or Kata Containers |
| Financial / medical / PII workloads | Firecracker + network egress allowlist + secret injection (never env vars) |
| Plugin/extension system from third parties | WebAssembly (WASI) or Firecracker |
| Browser-side agent tools | WebAssembly (inherits browser sandbox) |
The Overhead Trade-off Is Smaller Than You Think
The performance argument against microVMs is weaker than it appears in 2026. Cold Firecracker boot is under 125 ms and less than 5 MiB memory overhead per VM. With snapshot/restore, end-to-end sandbox provisioning hits 28 ms using copy-on-write memory overlays over pre-warmed snapshots — faster than many database query round trips. AWS Lambda handles this at tens of trillions of invocations per month. Compute overhead inside the VM is less than 5% of bare metal.
gVisor's overhead profile is different: near-zero on compute-bound work, 10–30% on I/O-heavy workloads. For agents doing mostly text processing and API calls, gVisor overhead is negligible. For agents doing heavy file I/O, it's measurable.
The real cost of microVM isolation is operational: building or adopting the orchestration layer to manage VM lifecycle at scale. Managed platforms like e2b handle this with sub-200 ms provisioning out of the box. Kata Containers makes it consumable via standard kubectl and containerd without building a custom VMM orchestration layer. The choice for most teams is between building on top of these rather than choosing between Docker and Firecracker in the abstract.
