5 posts tagged with "agent-memory"

Hierarchical Memory Compaction: The Four Tiers Your Agent Memory Is Missing

May 1, 2026 · 11 min read

Software Engineer

Most agent memory systems collapse a four-layer problem into two layers and then act surprised when the seams show. There is the conversation buffer that gets truncated when it overflows the context window, and there is the vector store of "long-term memory" that everything older than the buffer gets dumped into. That is not a memory architecture. That is a queue and a junk drawer.

The agent that re-asks a regular user the same onboarding question three Mondays in a row is not failing because the model is bad. It is failing because there is no place in the system that holds "things this user has told me across sessions" with a different lifetime than "things every user has ever told me about how the product works." Those are different memories. They have different access patterns, different privacy contracts, and different rules for when to forget. Conflating them is the architectural mistake — and it has a fix.

Amortizing Context: Persistent Agent Memory vs. Long-Context Windows

April 20, 2026 · 9 min read

Tian Pan

Software Engineer

When 1 million-token context windows became commercially available, a lot of teams quietly decided they'd solved agent memory. Why build a retrieval system, manage a vector database, or design an eviction policy when you can just dump everything in and let the model sort it out? The answer comes back in your infrastructure bill. At 10,000 daily interactions with a 100k-token knowledge base, the brute-force in-context approach costs roughly $5,000/day. A retrieval-augmented memory system handling the same load costs around $333/day — a 15x gap that compounds as your user base grows.

The real problem isn't just cost. It's that longer contexts produce measurably worse answers. Research consistently shows that models lose track of information positioned in the middle of very long inputs, accuracy drops predictably when relevant evidence is buried among irrelevant chunks, and latency climbs in ways that make interactive agents feel broken. The "stuff everything in" approach doesn't just waste money — it trades accuracy for the illusion of simplicity.

Agent Memory Garbage Collection: Engineering Strategic Forgetting at Scale

April 14, 2026 · 10 min read

Tian Pan

Software Engineer

Every production agent team eventually builds the same thing: a memory store that grows without bound, retrieval that degrades silently, and a frantic sprint to add forgetting after users report that the agent is referencing their old job, a deprecated API, or a project that was cancelled three months ago. The industry has poured enormous effort into giving agents memory. The harder engineering problem — garbage collecting that memory — is where the real production reliability lives.

The parallel to software garbage collection is more than metaphorical. Agent memory systems face the same fundamental tension: you need to reclaim resources (context budget, retrieval relevance) without destroying data that's still reachable (semantically relevant to future queries). The algorithms that solve this look surprisingly similar to the ones your runtime already uses.

The Forgetting Problem: When Unbounded Agent Memory Degrades Performance

April 12, 2026 · 9 min read

Tian Pan

Software Engineer

An agent that remembers everything eventually remembers nothing useful. This sounds like a paradox, but it's the lived experience of every team that has shipped a long-running AI agent without a forgetting strategy. The memory store grows, retrieval quality degrades, and one day your agent starts confidently referencing a user's former employer, a deprecated API endpoint, or a project requirement that was abandoned six months ago.

The industry has spent enormous energy on giving agents memory. Far less attention has gone to the harder problem: teaching agents what to forget.

Graph Memory for LLM Agents: The Relational Blind Spots That Flat Vectors Miss

April 10, 2026 · 10 min read

Tian Pan

Software Engineer

A customer service agent knows that the user prefers morning delivery. It also knows the user's primary address is in Seattle. What it cannot figure out is that the Seattle address is a work address used only on weekdays, and the morning delivery window does not apply there on Mondays because of a building restriction the user mentioned three months ago. Each fact is retrievable in isolation. The relationship between them is not.

This is the failure mode that bites production agents working from flat vector stores. Each piece of information exists as an embedding floating in high-dimensional space. Similarity search retrieves facts that match a query. It does not recover the structural connections between facts — the edges that give them meaning in combination.

Most agent memory architectures are built around vector databases because they are fast, simple to set up, and work well for the majority of retrieval tasks. The failure cases are subtle enough that they often survive into production before anyone notices the pattern.

About Tian Pan