Skip to main content

Sandboxing Agents That Can Write Code: Least Privilege Is Not Optional

· 12 min read
Tian Pan
Software Engineer

Most teams ship their first code-executing agent with exactly one security control: API key scoping. They give the agent a GitHub token with repo:read and a shell with access to a working directory, and they call it "sandboxed." This is wrong in ways that become obvious only after an incident.

The threat model for an agent that can write and execute code is categorically different from the threat model for a web server or a CLI tool. The attack surface isn't the protocol boundary anymore — it's everything the agent reads. That includes git commits, documentation pages, API responses, database records, and any file it opens. Any of those inputs can contain a prompt injection that turns your research agent into a data exfiltration pipeline.

This post covers the four layers of isolation you actually need — container isolation, filesystem controls, network egress policy, and credential scoping — plus the capability audit process that tells you whether you've done it right. The goal isn't theoretical completeness; it's making your agent a boring lateral movement target instead of an interesting one.

Why API Key Scoping Isn't a Sandbox

The intuitive first defense is to restrict what the agent can call: read-only GitHub token, no delete operations, scoped to a single repository. This addresses direct privilege escalation — an agent with a read-only token cannot push code. But it leaves the most dangerous attack vectors untouched.

In 2025, a widely-used open-source AI agent platform had an unauthenticated code-validation endpoint that allowed remote code execution. Before the patch, the only thing preventing exploitation was the network perimeter — not any isolation inside the agent's execution environment. In a separate class of incidents, attackers fed reconciliation agents wildcard queries phrased as routine business tasks; the agents found those requests semantically reasonable and executed them, exporting thousands of customer records. Neither of these would have been stopped by tighter API key scopes.

The underlying issue is that your agent is not just executing your code. The LLM generates code at runtime. You control the policy — what the agent is allowed to do — but not the specific execution path. If a malicious instruction reaches the agent through any input channel, and the agent has a general-purpose shell with no additional isolation, the instruction runs. The API key only gates what the agent can call over authenticated APIs; it doesn't gate what the generated code can do on the host.

A properly sandboxed agent has four distinct isolation layers. Weaken any one of them and you're relying on the others to compensate.

Layer 1: Container Isolation — Choose Your Kernel Boundary

The first question is what kernel boundary sits between the agent's code and your host infrastructure. There are four practical options, each with different tradeoff profiles.

Plain containers (Docker/runc) share the host kernel. A syscall that escapes the container namespace lands directly on the host. Three runc CVEs in 2025 demonstrated mount-race conditions that allowed writes to protected host paths from inside a container. Plain containers are appropriate only if layered with strict seccomp and AppArmor profiles — and even then, a kernel vulnerability is a single bug away from full host access.

gVisor interposes a user-space kernel between the container and the host kernel. All syscalls from the sandbox go through gVisor's "Sentry" before reaching the real kernel. This eliminates most kernel escape paths at the cost of 10–20% I/O overhead, higher on syscall-heavy workloads. gVisor is a good fit for general Python workloads where the performance tax is acceptable and you're running on GKE or a compatible Kubernetes environment.

Firecracker microVMs give each sandbox its own Linux kernel on top of KVM hardware virtualization. Escaping the sandbox now requires a hypervisor escape, not just a syscall escape — a substantially higher bar. Cold starts are around 125ms; memory overhead is roughly 5MB per instance. E2B runs every agent sandbox in its own Firecracker VM, and its growth from 40K sessions/month to 15M sessions/month in one year is evidence that the operational overhead is manageable. Firecracker is the right choice if you're running at scale and need the strongest isolation without going to bare metal.

WebAssembly (WASM/WASI) offers near-zero cold starts and formally verified memory safety, but Python scientific computing — NumPy, PyTorch, most ML tooling — has incomplete or no WASI support today. WASM is viable for narrow, pure-computation tasks but not for general agent workloads.

The practical baseline: use Firecracker or gVisor as your kernel boundary, and layer seccomp-BPF on top regardless. A well-tuned seccomp profile blocks ptrace, mount, pivot_root, and raw socket creation. A seccomp violation kills the process even inside a microVM, giving you defense in depth if the kernel boundary fails.

Layer 2: Filesystem Namespacing — Every Path Access Is a Decision

The default filesystem layout in a container is far too permissive for an agent that generates and runs code. The generated code can traverse directories the agent was never intended to access, find credentials cached to disk, read configuration files, or write persistence mechanisms that survive sandbox teardown.

Linux gives you the tools to fix this precisely. Mount namespaces let you give the sandbox its own filesystem view, separate from the host. Landlock (available since Linux 5.13) provides capability-based filesystem access control at the kernel level: you can restrict the agent process to specific directory trees with specific access modes, and that restriction propagates to all child processes spawned by agent-generated code.

The correct filesystem layout for an agent sandbox:

  • Input data and reference files: mounted read-only. The agent can read them; generated code cannot modify them.
  • Workspace output directory: read-write, scoped to a session-specific path (/workspace/{session-id}/). Nothing outside this directory is writable.
  • /tmp: ephemerally mounted, cleared on sandbox teardown. No persistence across sessions.
  • System directories: inaccessible via mount namespace — the agent doesn't need to see /etc, /var, or /home.
  • Secrets and API keys: injected as environment variables at runtime via a secrets manager. Never written to disk in the workspace where the agent's file tools can read them.

The secrets-on-disk problem deserves emphasis. An MCP server study in 2025 found that 82% of tested servers were vulnerable to path traversal when filesystem permissions weren't scoped to specific paths. If your agent has a file-reading tool and your API keys live anywhere on the host filesystem, a sufficiently clever prompt can find them. The fix is not better prompting — it's mounting secrets as in-memory environment variables and ensuring the agent's mount namespace has no access to paths outside the workspace.

Critically: block all writes outside the workspace directory as a hard kernel-level control, not a guideline. Persistence mechanisms, sandbox escapes, and RCE staging all require writing to a location that survives the session. Remove that capability structurally.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates