Browser Agent Session Bleed: When One Profile Serves Many Tenants
A computer-use agent finishes a task on a customer's CRM, the worker pool returns the browser to its idle ring, the next request lands a few hundred milliseconds later, and the navigation to the dashboard succeeds — except it succeeds as the wrong user. The OAuth cookie from the previous session was still on the profile. The trace shows navigation succeeded, screenshot captured, action performed. Nothing in the run log says the agent was acting as someone who never asked it to.
This is the failure class that browser agents inherit silently from the libraries they're built on. Headless browser frameworks were designed for one user per profile because that's how a browser has worked for thirty years. When a worker pool reuses profiles to amortize the eight-second cold start of a fresh Chromium instance, that one-user assumption breaks, and the breakage is invisible to every layer of telemetry the team usually trusts.
Why profiles get reused in the first place
Cold-starting a clean browser is expensive. A fresh Chromium process with all the agent's required extensions loaded takes several seconds before it can render the first page; for a user-facing agent with a latency budget under two seconds, that's not a startup cost you can pay per request. So the engineering reflex is the same one every web platform has used since the dawn of connection pools: keep a warm pool of N browsers, hand one out per request, return it when the request is done.
The problem is that a "connection" in a database pool has no persistent state beyond the TCP socket. A "browser" in a browser pool has thirty years of accumulated state primitives — cookies, localStorage, sessionStorage, indexedDB, service-worker caches, HTTP cache, autofill data, extension storage, downloaded files, password manager state. The browser's own design assumes one human is sitting at it, and every state mechanism it ships exists to serve that human's session continuity. Reuse the profile across requests, and you've just inverted that assumption without telling the browser.
The reuse pattern shows up in a few flavors, all with the same underlying risk:
- A pool of long-running Playwright
BrowserContextinstances, each handed to whichever request is next in the queue. - A pool of full browser processes with
--user-data-dirpointing at a persistent directory, so login sessions survive restarts. - A "warm fleet" service backing computer-use agents, where the pool is sized to peak concurrency and a profile may stay alive for hours between requests.
In every case, the credentials accumulated by the previous tenant are still sitting on disk and in memory when the next tenant's request begins.
The state surface a browser carries between calls
Engineers who haven't worked the security side of headless browsers tend to underestimate how much survives a page.close(). The list is longer than it looks:
- Cookies, including HTTP-only and secure ones — they belong to the profile, not the tab.
- localStorage and sessionStorage for every origin the previous request touched.
- IndexedDB, where SaaS apps increasingly stash JWTs, session metadata, and offline caches.
- Service-worker caches, which can serve up a previous tenant's authenticated responses without ever hitting the network.
- HTTP cache, which can short-circuit a re-auth flow and serve a logged-in dashboard from disk.
- Extension storage, including password-manager autofill data and SSO-helper state.
- Downloaded files and the file-chooser working directory.
- Autofill profiles — name, address, payment methods — that the next request can submit into a form by accident.
A "session boundary" between requests has to wipe every one of these, not just the cookie jar. Most homegrown reset routines wipe the first three and miss the rest. Service-worker caches in particular are easy to overlook because they're part of the page's offline behavior, not the obvious "session" surface.
Frameworks like Playwright explicitly document that creating a fresh BrowserContext is the supported isolation primitive — contexts are designed to be cheap to create and fully isolated from each other within a single browser process. The trap is that teams who learn this pattern from the testing world tend to map it onto their agent infrastructure incorrectly: in tests, contexts are created and destroyed per test by the framework, so isolation is automatic. In an agent worker pool, that lifecycle has to be enforced by the team, and "we use Playwright contexts" is not the same as "we destroy and recreate the context per request."
The failure modes the trace doesn't show
When session bleed happens, the agent's own observability says nothing went wrong. The navigation returned 200. The page rendered. The screenshot captured. The DOM action succeeded. The model produced a coherent response based on what it saw. From the agent's perspective, the request was a success.
- https://playwright.dev/docs/browser-contexts
- https://docs.browserless.io/enterprise/user-data-directory
- https://www.cloudflare.com/learning/access-management/what-is-browser-isolation/
- https://arxiv.org/html/2505.13076v1
- https://www.paloaltonetworks.com/blog/sase/ai-and-the-new-browser-security-landscape/
- https://e2b.dev/pricing
- https://web.dev/learn/pwa/offline-data/
- https://chameleonmode.com/cookie-and-session-management-5-critical-isolation-requirements/
