Skip to main content

The Conversation Summarization That Erased the Consent Flag the User Gave You

· 11 min read
Tian Pan
Software Engineer

At turn 3, your user clicked "do not retain my code." At turn 7, they toggled off "use my conversations to improve the model." At turn 12, they opted out of cross-session memory. At turn 40, your context budget runs out. The compaction pass folds turns 1–30 into a tidy 200-token summary that reads beautifully: it captures what the user asked, what your agent did, and what came of it. At turn 41, your agent — armed with that summary and the most recent ten turns — confidently writes the user's code into a memory store the user opted out of at turn 7.

Your audit log now contains a consent event at t=3, a violating action at t=41, and between them a paragraph of prose that has no field for why the action was permitted. The summarizer was trained to compress conversations, not to forward control state. Nobody told it the consent toggle was load-bearing. Nobody could have, because consent wasn't in the conversation — it was in a structured field next to it, and the structured field didn't survive the trip through summarization.

This isn't a hypothetical. Every team that has shipped a long-running agent with auto-compaction and a privacy surface has this bug latent in their architecture; most haven't tripped it yet because their sessions don't run long enough or their consent toggles haven't been audited against post-compaction actions. The teams who have tripped it usually discover it the way you discover most production privacy bugs: from a regulator's letter or a customer support ticket that starts with "I opted out of this."

Conversation Memory Is Two Streams, Not One

A useful mental model: a long-running agent session carries two parallel streams.

The semantic stream is the prose — the user's messages, the agent's responses, the tool calls and their results. It's what your summarizer was designed to compress. When you read a post-compaction summary, this is what you see.

The structured stream is everything else — consent flags, permission grants, region-of-operation, the user's selected pseudonym, the redaction policy in force, the data-retention class of the session, the regulatory jurisdiction. Some of it the user set explicitly through UI. Some of it came from the auth layer. Some of it was inferred from a tool call ("user invoked the EU-resident-only handler, so this session is GDPR-scoped"). Almost none of it appears in the prose.

A correctly built session keeps both streams synchronized: every action the agent takes is gated by the structured stream and described in the semantic one. A correctly built compaction step preserves both — the prose is summarized and the structured fields are forwarded verbatim.

Most compaction steps preserve only one of the two. The summarizer is an LLM call given a prompt like "summarize this conversation so far, preserving important details." It reads the messages. It does not read the side-band metadata, because the side-band metadata wasn't in its prompt. It produces excellent prose. The structured stream silently disappears, and the agent on the other side of the compaction now has full semantic memory and zero structured state.

This is the failure mode at its plainest: after compaction, the agent remembers what it was asked and forgets what it was allowed.

Why the Test That Should Catch This Doesn't

The usual way teams test summarization is to read the summary and judge it: does it capture the conversation? Could a fresh agent pick up where the old one left off? Does the user's intent survive?

These are the right questions for a chatbot. They are the wrong questions for an agent with a privacy surface. The summary can pass all three tests and still be a privacy violation, because the test is evaluating the wrong stream.

A consent flag the user toggled at turn 7 doesn't appear in the prose. It might appear as a system event ("user updated preferences") but the actual state change — retain_code: false — lives in a separate field that the summarizer was never asked to look at. When a reviewer reads the summary and says "yes, this captures the conversation," they are correct. They are also missing the part that matters.

The structural problem is that the metadata you most need to preserve is the metadata that isn't in the conversation. It's adjacent to the conversation. And the people designing the summarizer are usually the AI platform team, who are reasoning about conversation quality. The people who own consent are usually the privacy or legal-engineering team, who are reasoning about audit trails. Neither team is reasoning about the seam between them. The seam is where the bug lives.

What a Compaction Step Should Actually Do

A compaction step that respects both streams looks different from a summarization call. It is a multi-stage transformation that treats structured state as a first-class input and output.

Forward every consent flag and policy field verbatim. The compaction protocol should enumerate the structured fields that exist on the session and copy them across the boundary unchanged. No summarization, no inference, no "let's collapse these three related toggles into one." If the user opted out of code retention, the post-compaction state must contain retain_code: false, byte-for-byte, exactly as it was before compaction. This is the side-band metadata that travels in a channel the summarizer cannot rewrite.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates