16 posts tagged with "memory"

Agent Memory Is a Compliance Surface: The Records-Management System You Didn't Sign Up to Build

May 13, 2026 · 12 min read

Software Engineer

The first compliance escalation against your agent memory layer almost never arrives as a regulator's letter. It arrives as a Jira ticket from your enterprise sales engineer that says "the customer's privacy team is blocking the contract — they want to know what 'forget my user' actually means in your system, and they want a written answer by Friday." That ticket lands six to twelve months after the memory layer shipped, and the engineering team that built it discovers, in the time it takes to read the question, that they accidentally built a records-management system without any of the primitives a records-management system is supposed to have.

This is the structural problem with long-term memory in agentic products. The team building it optimizes for the things memory is sold to do — retrieval quality, latency, storage cost, the felt-personalization that makes the assistant feel like it knows the user. Nobody in the design review prices the parallel system being built at the same time: a per-user, per-tenant, multi-region data store with retention obligations, deletion semantics, audit export requirements, and a regulator's clock that starts the moment the first user's data lands in it. Memory is not a feature. It is the operational surface that every privacy regime, every enterprise procurement questionnaire, and every right-to-erasure request will eventually find.

Agent Memory Eviction: Why LRU Survives a Model Upgrade and Salience Doesn't

May 10, 2026 · 9 min read

Tian Pan

Software Engineer

The team that ships an agent with salience-weighted memory eviction has, without realizing it, signed up for a memory migration project at every model upgrade. The eviction policy looks like a quality lever — pick the smartest scoring approach, get the best recall — but it is secretly a versioning contract. When the scoring model changes, the agent's effective past changes too. None of the tooling teams build around prompts and evals catches it, because the artifact that drifted is not a prompt or an eval. It is a sequence of decisions about what to forget, made months ago, by a model that no longer exists.

LRU and LFU don't have this problem. They are deterministic, model-independent, and survive upgrades cleanly. They also throw away information that a thoughtful judge would have kept. That is the tradeoff most teams accept once, on day one, when a demo recall metric is the only thing being measured — and it is the tradeoff that bites quarterly for the rest of the agent's lifetime.

Cross-Channel Memory: When Your Agent Forgets the Email Thread

May 9, 2026 · 10 min read

Tian Pan

Software Engineer

A customer asks your assistant in Slack on Monday how to enable a feature, gets a clean answer, and goes about their day. On Friday they email asking to confirm what was decided, and the assistant — running off a different session store, with no idea Monday's chat ever happened — gives a contradictory recommendation. The customer doesn't file two tickets against two products. They file one ticket against your AI, and they're right to. To them there is one assistant. The fact that you wrote three of them, glued to three surface-specific session stores, is an implementation detail you weren't supposed to leak.

This is the cross-channel memory problem, and it sits at the intersection of two things teams underestimate: how aggressively users assume continuity, and how aggressively channel teams write their own session stores because it was the path of least resistance to ship. Recent industry data puts the gap in stark terms — only 13% of organizations successfully carry full conversation context across channels, and CSAT for fragmented multichannel support sits at 28% versus 67% for true omnichannel. The 39-point delta isn't a model quality gap. It's a memory architecture gap.

The Session Boundary Problem: Where a Conversation Ends for Billing, Eval, and Memory

May 9, 2026 · 11 min read

Tian Pan

Software Engineer

Three teams are looking at the same event stream, each with a column called session_id, and each with a different definition of what a session is. Billing inherited a 30-minute idle window from the auth library. Eval inherited "everything until the user says 'bye' or stops typing for 10 minutes" from a chatbot framework. Memory uses a thread ID that the UI generates whenever the user clicks "New chat" — which most users never do. Three columns, three semantics, one rolled-up dashboard, three unrelated bugs that share a root cause.

This is the session boundary problem. It looks like an instrumentation nit, but it is actually a product question wearing infrastructure clothes: where does a conversation end? The honest answer is that there is no single answer — a session for billing is not the same object as a session for eval is not the same object as a session for memory — and a team that picks one default and lets the other two inherit it is shipping a billing dispute, an eval bias, and a memory leak with the same root cause.

The Summary Tax: When Compaction Eats More Tokens Than It Saves

May 9, 2026 · 10 min read

Tian Pan

Software Engineer

A long-running agent crosses its compaction threshold every twelve turns. Each pass costs an LLM call sized to the running window — first 8K tokens, then 14K, then 22K — because the span being summarized grows with every trigger. By turn sixty, the user has spent more tokens watching the agent re-summarize itself than they spent on the actual reasoning that mattered. The cost dashboard reads "user inference cost" as a single number, blissfully unaware that half of it paid for compression of context the user will never look at again.

This is the summary tax: a class of overhead that scales with conversation length, fires invisibly between user turns, and shows up as a single line item that conflates the work the user paid for with the bookkeeping the system did to manage itself. It is the closest thing modern agent architectures have to garbage-collection pause time — and most teams are running production with -verbose:gc turned off.

Agent Memory Contamination: How One Bad Tool Response Poisons a Whole Session

May 5, 2026 · 10 min read

Tian Pan

Software Engineer

Your agent completes 80% of a multi-step research task correctly, then confidently delivers a conclusion that's completely wrong. You trace back through the logs and find the culprit at step three: a tool call returned stale data, the agent integrated that data as fact, and every subsequent reasoning step built on that poisoned premise. By the end of the session, the agent was correct about everything except the thing that mattered.

This is agent memory contamination — and it's one of the most insidious reliability failures in production agentic systems. Unlike a crash or timeout, it produces a confident wrong answer. Observability tooling records a successful run. The user walks away with bad information.

Ghost Context: How Contradictory Beliefs Break Long-Running Agent Memory

May 5, 2026 · 11 min read

Tian Pan

Software Engineer

Your agent has talked to the same user 400 times. Six months ago she said she preferred Python. Three months ago her team migrated to Go. Last week she mentioned a new TypeScript project. All three facts are sitting in your vector store right now — semantically similar, chronologically unordered, equally weighted. The next time she asks for code help, your agent retrieves all three, hands a contradictory mess to the model, and confidently generates Python with Go idioms for a TypeScript context.

This is ghost context: stale beliefs that never die, retrieved alongside their replacements, silently corrupting agent reasoning.

The problem is underappreciated because it doesn't produce visible errors. The agent doesn't crash. It doesn't refuse to respond. It produces fluent, confident output that's just subtly, expensively wrong.

Agent Memory Drift: Why Reconciliation Is the Loop You're Missing

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

The most dangerous thing your long-running agent does is also the thing it does most confidently: answer from memory. The customer's address changed last Tuesday. The ticket the agent thinks is "open" was closed yesterday by a human. The product feature the agent has tidy explanatory notes about shipped in a different shape than the spec the agent read three weeks ago. None of this is hallucination in the textbook sense — the model is recalling exactly what it stored. The world simply moved while the agent was looking elsewhere.

Most teams treat memory like a write problem: what should the agent remember, how do we summarize, what's the embedding strategy, how do we keep the store from blowing up. That framing produces architectures that grow more confident as they grow more wrong. The harder problem — the one that determines whether your agent stays useful past week three — is reconciliation: the explicit, ongoing loop that compares what the agent thinks is true against what the underlying systems say is true right now.

Chat History Is a Database. Stop Treating It Like Scrollback.

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

The most common production complaint about agentic products is some version of "it forgot what we said." The complaint shows up at turn eight, or fifteen, or thirty — never at turn two — and the team's first instinct is always the same: bigger context window. Which is the wrong instinct, because the bug is not in the model. The bug is that the team is treating conversation history as scrollback in a terminal — append a line, render the tail, truncate when full — when what they actually built, without realizing it, is a read-heavy database with append-only writes, a hot working set, an eviction policy hiding inside their truncation rule, and a query pattern that depends on the kind of question being asked. Once you accept that, the entire shape of the problem changes.

The scrollback model is so seductive because the chat UI looks like a transcript. Messages flow downward, the user reads them top-to-bottom, and the natural way to feed the model is to splice the latest N turns into the prompt. The data structure feels free. There's no schema, no index, no query — just append, render, repeat. And for the first few turns, every architecture works. The model has the whole conversation in its context, the bill is small, and the demo is delightful.

Session Stitching: Why Your Conversation-ID Is a Lie

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

A user starts negotiating a contract with your agent on her desktop at 9 a.m. She gets a Slack ping, switches to her phone over lunch to ask one clarifying question, and reopens the desktop tab at 4 p.m. to revise the draft. To her, that was one task — three hours of working through one contract. To your system, that was three sessions on two devices, each with its own conversation-id, each with its own memory window, each presenting a fresh greeting and asking her to re-paste the draft she'd already discussed twice.

The bug is not in the model. The bug is that your platform encoded "session" — a transport-layer artifact about a single connection — as the unit of context, while your user encoded "task" — the contract — as the unit of context. Every framework on the market quietly conflates the two, and the gap between them is where half of agent UX disappears.

Agent Memory Schema Evolution Is Protobuf on Hard Mode

April 23, 2026 · 11 min read

Tian Pan

Software Engineer

The first painful agent-memory migration always teaches the same lesson: there were two schemas, and you only migrated one of them. The storage layer is fine — every row was rewritten, every key is in its new shape, the backfill job logged success. The agent is broken anyway. It keeps writing to user.preferences.theme, retrieves nothing, then helpfully synthesizes a default from context as if the key never existed. The migration runbook reports green. Users report stale memory.

The asymmetry is structural. A traditional service that depends on a renamed column gets a hard error and you fix it. An agent that depends on a renamed memory key gets a soft miss and confabulates around it. The schema lives in two places — your store and the model's context — and you can only migrate one of them with a SQL script.

Protobuf solved a version of this problem twenty years ago by codifying an additive-only discipline: fields are forever, numbers are forever, wire types never change, and removal is replaced with deprecation. That discipline is the right starting point for agent memory, with one extra constraint that makes it harder. Protobuf receivers ignore unknown fields by design. Agents don't.

The Silent Corruption Problem in Parallel Agent Systems

April 20, 2026 · 12 min read

Tian Pan

Software Engineer

When a multi-agent system starts behaving strangely — giving inconsistent answers, losing track of tasks, making decisions that contradict earlier reasoning — the instinct is to blame the model. Tweak the prompt. Switch to a stronger model. Add more context.

The actual cause is often more mundane and more dangerous: shared state corruption from concurrent writes. Two agents read the same memory, both compute updates, and one silently overwrites the other. The resulting state is technically valid — no exceptions thrown, no schema violations — but semantically wrong. Every agent that reads it afterward reasons correctly over incorrect information.

This failure mode is invisible at the individual operation level, hard to reproduce in test environments, and nearly impossible to distinguish from model error by looking at outputs alone. O'Reilly's 2025 research on multi-agent memory engineering found that 36.9% of multi-agent system failures stem from interagent misalignment — agents operating on inconsistent views of shared information. It's not a theoretical concern.

About Tian Pan