The Conversation Tree Your Server Stored As A Log
A user types "actually, I meant fifty, not fifteen," hits the pencil icon on their last message, and edits it. The UI does what good UIs do: it shows them the corrected message, fades out the old one, scrolls the assistant's stale reply into a struck-through ghost, and presents a clean conversation that reads as if the original mistake never happened. The user, satisfied, sends the next turn. The agent answers using fifteen.
The bug is not in the model. The model received exactly what the server sent it, and the server sent it the original message, the original assistant response, the regret, the edited message, and the new request — all concatenated, all in order, all live. The user is having a conversation they edited. The agent is having a conversation that was never edited. The two transcripts diverge at turn three and never reconcile, and every subsequent turn pays interest on the gap.
The UI is a tree, the storage is a log, and nobody told the model
When teams build a chat product, two design decisions happen in different rooms. The frontend team designs the conversation as something the user navigates: a list of bubbles, each editable, each retry-able, each with a little "regenerate" button that quietly forks a sibling response. That is a tree. Every edit creates a branch. Every regenerate creates a sibling. The UI projects one path through the tree as the visible conversation and hides the rest behind affordances the user can re-enter if they want to.
The backend team, working off a different intuition, models the conversation as an event log. Every user message is an append. Every assistant response is an append. Every edit is also an append — recorded as message.v2 next to message.v1 rather than as a mutation of message.v1. This is the right choice for auditability, for streaming, for analytics, for replay. It is the wrong choice for prompt construction, and nobody notices until the model starts answering questions the user does not remember asking.
The gap is silent. The frontend keeps the user's mental model coherent. The backend keeps the data warehouse honest. The prompt assembler in the middle — the function that turns the conversation into a messages array — concatenates the log in chronological order because that is what the schema gives it, and the model dutifully synthesizes a coherent response from both the corrected and the uncorrected timeline. The fluency of large models is what makes this failure invisible: a less fluent model would produce gibberish from the contradictory context, and the bug would surface in week one. A frontier model produces something that reads correct, just answers the wrong question.
The failure modes practitioners keep rediscovering
The bug has shapes. The first shape is the edit that didn't take. The user revised "fifteen" to "fifty," the UI accepted it, and the next assistant turn cited fifteen as if the user had insisted on it. The user re-edits, increasingly forcefully. The agent re-cites, increasingly confidently. Both parties are reasoning correctly over the inputs they each see. The inputs are different.
The second shape is the ghost assistant message. The user hit "regenerate" on turn three because the response was off. The UI replaced the response. The log appended a sibling. On turn four, the model's context contains the original assistant response, the regenerated response, and the user's turn-four follow-up, which makes sense against the regenerated one and is incoherent against the original. The model picks one to anchor on, often the earlier one because it is closer to the user's framing. The conversation now references a response the user has never seen.
The third shape is the rollback that didn't roll back. The user used the UI's "go back to turn two and continue from there" affordance, retyped turn three, and built four more turns on the new branch. The server, modeling this as appends, still has the old turns three through seven in the log. If the prompt assembler walks the log rather than the tree, the model sees all twelve turns: the user's parallel-universe self continued the abandoned thread, and the agent reasons about both timelines as if they were one.
The fourth shape is the silent token tax. Even when the model's response stays coherent — perhaps because the latest user message is unambiguous enough to override the conflicting earlier turns — every retained pre-edit message is paying token rent on every subsequent call, forever, for content the user believes they erased. At scale, this is a measurable line item that nobody attributed because nobody knew it was there.
Why "just walk the active branch" is the answer that takes work
Once the diagnosis is named, the fix sounds obvious: walk the tree, not the log. Send the model the messages the user can see on their screen, not the messages the database happens to have. The work is in making "the messages the user can see" a real, queryable, server-side concept rather than an implicit frontend computation.
That means the server needs a conversation tree as a first-class object. Every message has a parent pointer. An edit produces a sibling of the original user message and is marked as the active sibling. A regenerate produces a sibling of the original assistant response and is marked active. A rollback re-points the conversation's "head" to a node earlier in the tree, leaving the abandoned branch as data but not as live context. The prompt assembler then walks from the root to the head, following the active-sibling pointer at every fork, and emits exactly that sequence as the messages array.
- https://docs.langchain.com/oss/python/langchain/frontend/branching-chat
- https://ably.com/docs/ai-transport/internals/conversation-tree
- https://arxiv.org/abs/2603.21278
- https://medium.com/@omkar121212/branching-conversations-with-llms-building-an-ai-memory-tree-abbbedd76a86
- https://bytecircuit.com/fork-the-chat-what-chatgpts-new-feature-reveals-about-human-thinking/
- https://aitoolsclub.com/how-to-use-chatgpts-new-chat-branching-to-explore-ideas-without-losing-context/
- https://github.com/aws-samples/managing-chat-history-and-context-at-scale-in-generative-ai-chatbots
- https://ai-sdk.dev/docs/ai-sdk-ui/chatbot-message-persistence
- https://tldraw.dev/starter-kits/branching-chat
