The Share-Nothing Agent: Designing AI Agents for Horizontal Scalability
Your load balancer assigns an incoming agent request to replica 3. But the user's conversation history lives in memory on replica 7. Replica 3 has no idea what has happened in the last six turns, so it starts over, confuses the user, and your on-call engineer gets paged at 2 AM. You add sticky sessions. Now all requests for that user route to replica 7 forever. You've traded a correctness bug for a scalability ceiling.
This is the moment teams realize that "horizontal scaling" for AI agents is not the same problem as horizontal scaling for web servers. The fixes are different, and the naive paths fail in predictable ways.
