Skip to main content

3 posts tagged with "backend"

View all tags

SSE vs WebSockets vs gRPC Streaming for LLM Apps: The Protocol Decision That Bites You Later

· 11 min read
Tian Pan
Software Engineer

Most teams building LLM features pick a streaming protocol the same way they pick a font: quickly, without much thought, and they live with the consequences for years. The first time the choice bites you is usually in production — a CloudFlare 524 timeout that corrupts your SSE stream, a WebSocket server that leaks memory under sustained load, or a gRPC-Web integration that worked fine in unit tests and silently fails when a client needs to send messages upstream. The protocol shapes your failure modes. Picking based on benchmark throughput is the wrong frame.

Every major LLM provider — OpenAI, Anthropic, Cohere, Hugging Face — streams tokens over Server-Sent Events. That fact is a strong prior, but not because SSE is fast. It's because SSE is stateless, trivially compatible with HTTP infrastructure, and its failure modes are predictable. The question is whether your application has requirements that force you off that path.

Database Connection Pools Are the Hidden Bottleneck in Your AI Pipeline

· 9 min read
Tian Pan
Software Engineer

Your AI feature ships. Response times look reasonable in staging. A week later, production starts throwing mysterious p99 spikes — latency jumps from 800ms to 8 seconds under moderate load, with no GPU pressure, no model errors, and no obvious cause. You add more replicas. It doesn't help. You profile the model server. It's fine. You add caching. Still no improvement.

Eventually someone checks the database connection pool wait time. It's been sitting at 95% utilization since day three.

This is the most common category of AI production incident that nobody talks about, because connection pool exhaustion looks like model slowness. The symptoms appear in the wrong layer — you see high latency on LLM calls, not on database queries — so the diagnosis takes days while users experience degraded responses.

Agent-Friendly APIs: What Backend Engineers Get Wrong When AI Becomes the Client

· 11 min read
Tian Pan
Software Engineer

In 2024, automated bot traffic surpassed human traffic on the internet for the first time. Gartner projects that more than 30% of new API demand by 2026 will come from AI agents and LLM tools. And yet only 24% of organizations explicitly design APIs with AI clients in mind.

That gap is where production systems break. Not because the LLMs are bad, but because APIs built for human developers have assumptions baked in that silently fail when an autonomous agent is the caller. The agent can't ask for clarification, can't read a doc site, and can't decide on its own whether a 422 means "fix your request" or "try again in a few seconds."

This post is for the backend engineer who just found out their service is being called by an AI agent — or who is about to build one that will be.