Skip to main content

3 posts tagged with "computer-use"

View all tags

Browser Agent Session Bleed: When One Profile Serves Many Tenants

· 10 min read
Tian Pan
Software Engineer

A computer-use agent finishes a task on a customer's CRM, the worker pool returns the browser to its idle ring, the next request lands a few hundred milliseconds later, and the navigation to the dashboard succeeds — except it succeeds as the wrong user. The OAuth cookie from the previous session was still on the profile. The trace shows navigation succeeded, screenshot captured, action performed. Nothing in the run log says the agent was acting as someone who never asked it to.

This is the failure class that browser agents inherit silently from the libraries they're built on. Headless browser frameworks were designed for one user per profile because that's how a browser has worked for thirty years. When a worker pool reuses profiles to amortize the eight-second cold start of a fresh Chromium instance, that one-user assumption breaks, and the breakage is invisible to every layer of telemetry the team usually trusts.

Browser Agents in Production: The DOM Fragility Tax

· 13 min read
Tian Pan
Software Engineer

A calendar date picker broke a production browser agent for three days before anyone noticed. The designer had swapped a native <input type="date"> for a custom React component during a minor UI refresh. No API changed. No content moved. Just 24px cells in a new layout — and the vision model that had been reliably clicking the right dates now missed by one cell, silently booking appointments on the wrong day.

This is the DOM fragility tax: the ongoing operational cost of building automated agents on top of a web that was never designed to be operated by machines. Unlike most infrastructure taxes, it compounds. The web changes. Anti-bot defenses evolve. SPAs get more dynamic. And your agent quietly degrades.

Computer Use Agents in Production: When Pixels Replace API Calls

· 9 min read
Tian Pan
Software Engineer

Most AI agents interact with the world through structured APIs — clean JSON in, clean JSON out. But a growing class of agents has abandoned that contract entirely. Computer use agents look at screenshots, reason about what they see, and drive a mouse and keyboard like a human operator. When the only integration surface is a screen, pixels become the API.

This sounds like a party trick until you realize how much enterprise software has no API at all. Legacy ERP systems, internal admin panels, proprietary desktop applications — the GUI is the only interface. For years, robotic process automation (RPA) handled this with brittle, selector-based scripts that shattered whenever a button moved three pixels. Computer use agents promise something different: visual understanding that adapts to UI changes the way a human would.