Skip to main content

One post tagged with "conversation"

View all tags

Stateful Multi-Turn Conversation Infrastructure: Beyond Passing the Full History

· 11 min read
Tian Pan
Software Engineer

Every demo of a conversational AI feature does the same thing: pass a list of messages to the model and print the response. The happy path works, looks great in a Jupyter notebook, and gets you a green light to ship. Then you get to production, and your p99 latency starts creeping up during peak hours. A month later, a customer complains that the assistant "forgot" everything from earlier in the session. Six weeks after that, your session store hits its memory ceiling during a product launch.

The fundamental problem is that "pass the full conversation history" is not a session management strategy. It is the absence of one.