Skip to main content

220 posts tagged with "ai-agents"

View all tags

Designing Approval Gates for Autonomous AI Agents

· 10 min read
Tian Pan
Software Engineer

Most agent failures aren't explosions. They're quiet. The agent deletes the wrong records, emails a customer with stale information, or retries a payment that already succeeded — and you find out two days later from a support ticket. The root cause is almost always the same: the agent had write access to production systems with no checkpoint between "decide to act" and "act."

Approval gates are the engineering answer to this. Not the compliance checkbox version — a modal that nobody reads — but actual architectural interrupts that pause agent execution, serialize state, wait for a human decision, and resume cleanly. Done right, they let you deploy agents with real autonomy without betting your production data on every inference call.

MCP in Production: What Nobody Tells You About the Model Context Protocol

· 10 min read
Tian Pan
Software Engineer

The "USB-C for AI" analogy is catchy. It's also wrong in the ways that matter most when you're the one responsible for keeping it running in production. The Model Context Protocol solves a real problem—the explosion of custom N×M integrations between AI models and external systems—but the gap between "it works in the demo" and "it handles Monday morning traffic without leaking data or melting your latency budget" is wider than most teams expect.

MCP saw an 8,000% growth in server downloads in the five months after its November 2024 launch, with 97 million monthly SDK downloads by April 2025. That adoption speed is both a sign of genuine utility and a warning: most of those servers went into production without the teams fully understanding what they were building on.

Six Context Engineering Techniques That Make Manus Work in Production

· 11 min read
Tian Pan
Software Engineer

The Manus team rebuilt their agent framework four times in less than a year. Not because of model changes — the underlying LLMs improved steadily. They rebuilt because they kept discovering better ways to shape what goes into the context window.

They called this process "Stochastic Graduate Descent": manual architecture searching, prompt fiddling, and empirical guesswork. Honest language for what building production agents actually looks like. After millions of real user sessions, they've settled on six concrete techniques that determine whether a long-horizon agent succeeds or spirals into incoherence.

The unifying insight is simple to state and hard to internalize: "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." A typical Manus task runs ~50 tool calls with a 100:1 input-to-output token ratio. At that scale, what you put in the context — and how you put it there — determines everything.

The Action Space Problem: Why Giving Your AI Agent More Tools Makes It Worse

· 9 min read
Tian Pan
Software Engineer

There's a counterintuitive failure mode that most teams encounter when scaling AI agents: the more capable you make the agent's toolset, the worse it performs. You add tools to handle more cases. Accuracy drops. You add better tools. It gets slower and starts picking the wrong ones. You add orchestration to manage the tool selection. Now you've rebuilt complexity on top of the original complexity, and the thing barely works.

The instinct to add is wrong. The performance gains in production agents come from removing things.

Four Strategies for Engineering Agent Context That Actually Scales

· 8 min read
Tian Pan
Software Engineer

There's a failure mode in production agents that most engineers discover the hard way: your agent works well on the first few steps, then starts hallucinating halfway through a task, misses details it was explicitly given at the start, or issues a tool call that contradicts instructions it received twenty steps ago. The model didn't change. The task didn't get harder. The context did.

Long-running agents accumulate history the way browser tabs accumulate memory — silently, relentlessly, until something breaks. Every tool response, observation, and intermediate reasoning trace gets appended to the window. The model sees all of it, which means it has to reason through all of it on every subsequent step. As context grows, precision drops, reasoning weakens, and the model misses information it should catch. This is context rot, and it's one of the most common failure modes in production agents.

Context Engineering: Memory, Compaction, and Tool Clearing for Production Agents

· 10 min read
Tian Pan
Software Engineer

Most production AI agent failures don't happen because the model ran out of context. They happen because the model drifted long before it hit the limit. Forrester has named "agent drift" the silent killer of AI-accelerated development — and Forrester research from 2025 shows that nearly 65% of enterprise AI failures trace back to context drift or memory loss during multi-step reasoning, not raw token exhaustion.

The distinction matters. A hard context limit is clean: the API rejects the request, the agent stops, you get an error you can handle. Context rot is insidious: the model keeps running, keeps generating output, but performance quietly degrades. GPT-4's accuracy drops from 98.1% to 64.1% based solely on where in the context window information is positioned. You don't get an error signal — you get subtly wrong answers.

This post covers the three primary tools for managing context in production agents — compaction, tool-result clearing, and external memory — along with the practical strategies for applying them before your agent drifts.

CLAUDE.md and AGENTS.md: The Configuration Layer That Makes AI Coding Agents Actually Follow Your Rules

· 9 min read
Tian Pan
Software Engineer

Your AI coding agent doesn't remember yesterday. Every session starts cold — it doesn't know you use yarn not npm, that you avoid any types, or that the src/generated/ directory is sacred and should never be edited by hand. So it generates code with the wrong package manager, introduces any where you've banned it, and occasionally overwrites generated files you'll spend an hour recovering. You correct it. Tomorrow it makes the same mistake. You correct it again.

This is not a model quality problem. It's a configuration problem — and the fix is a plain Markdown file.

CLAUDE.md, AGENTS.md, and their tool-specific cousins are the briefing documents AI coding agents read before every session. They encode what the agent would otherwise have to rediscover or be corrected on: which commands to run, which patterns to avoid, how your team's workflow is structured, and which directories are off-limits. They're the equivalent of a thorough engineering onboarding document, compressed into a form optimized for machine consumption.

Building AI Agents That Actually Work in Production

· 10 min read
Tian Pan
Software Engineer

Most teams building AI agents make the same mistake: they architect for sophistication before they have evidence that sophistication is needed. A production analysis of 47 agent deployments found that 68% would have achieved equivalent or better outcomes with a well-designed single-agent system. The multi-agent tax — higher latency, compounding failure modes, operational complexity — often eats the gains before they reach users.

This isn't an argument against agents. It's an argument for building them the same way you'd build any serious production system: start with the simplest thing that works, instrument everything, and add complexity only when the simpler version demonstrably fails.

Effective Context Engineering for AI Agents

· 11 min read
Tian Pan
Software Engineer

Nearly 65% of enterprise AI failures in 2025 traced back to context drift or memory loss during multi-step reasoning — not model capability issues. If your agent is making poor decisions or losing coherence across a long task, the most likely cause is not the model. It is what is sitting in the context window.

The term "context engineering" is proliferating fast, but the underlying discipline is concrete: active, deliberate management of what enters and exits the LLM's context window at every inference step in an agent's trajectory. Not a prompt. A dynamic information architecture that the engineer designs and the agent traverses. The context window functions as RAM — finite, expensive, and subject to thrashing if you don't manage it deliberately.

Mastering AI Agent Observability: Why Your Dashboards Are Lying to You

· 9 min read
Tian Pan
Software Engineer

Your agent is returning HTTP 200s. Latency is within SLA. Error rates are flat. Everything on the dashboard looks green — and your users are getting confidently wrong answers.

This is the core observability gap in AI systems: the metrics that traditionally signal system health are almost entirely irrelevant to whether your agent is actually doing its job. An agent can fluently hallucinate, skip required tools, use stale retrieval results, or reason itself into logical contradictions — all while your monitoring shows zero anomalies. The standard playbook for service observability doesn't transfer to agentic systems, and teams that don't understand this gap ship agents they can't trust, debug, or improve.

The 80% Problem: Why AI Coding Agents Stall and How to Break Through

· 10 min read
Tian Pan
Software Engineer

A team ships 98% more pull requests after adopting AI coding agents. Sounds like a success story — until you notice that review times grew 91% and PR sizes ballooned 154%. The code was arriving faster than anyone could verify it.

This is the 80% problem. AI coding agents are remarkably good at generating plausible-looking code. They stall, or quietly fail, when the remaining 20% requires architectural judgment, edge case awareness, or any feedback loop more sophisticated than "did it compile?" The teams winning with coding agents aren't the ones who prompted most aggressively. They're the ones who built better feedback loops, shorter context windows, and more deliberate workflows.

Systematic Debugging for AI Agents: From Guesswork to Root Cause

· 9 min read
Tian Pan
Software Engineer

When an AI agent fails in production, you rarely know exactly when it went wrong. You see the final output — a hallucinated answer, a skipped step, a tool called with the wrong arguments — but the actual failure could have happened three steps earlier. This is the core debugging problem that software engineering hasn't solved yet: agents execute as a sequence of decisions, and by the time you notice something is wrong, the evidence is buried in a long trace of interleaved LLM calls, tool invocations, and state mutations.

Traditional debugging assumes determinism. You can reproduce the bug, set a breakpoint, inspect the state. Agent debugging breaks all three of those assumptions simultaneously. The same input can produce different execution paths. Reproducing a failure requires capturing the exact context, model temperature, and external state at the moment it happened. And "setting a breakpoint" in a live reasoning loop is not something most agent frameworks even support.