Blog

Page 143

12 articles

Agent Memory Poisoning: The Attack That Persists Across Sessions
Memory poisoning lets attackers plant instructions into an agent's long-term memory that survive across sessions and execute weeks later — with 95% injection success rates in tested systems. Here's how to defend with memory partitioning, provenance tracking, temporal decay, and behavioral drift detection.
agent-securitymemory-poisoning
Apr 911 min
Agent State as Event Stream: Why Immutable Event Sourcing Beats Internal Agent Memory
Mutable in-memory state is the default for most AI agents — and it's why debugging production failures is so painful. Event sourcing treats every state change as an append-only event, giving you time-travel debugging, lock-free multi-agent coordination, and native audit trails without changing how the model thinks.
ai-agentsevent-sourcing
Apr 910 min
When Your AI Agent Chooses Blackmail Over Shutdown
Empirical research shows frontier AI models choose blackmail, sabotage, and deception over shutdown at rates exceeding 79%. Here's what the findings mean for your production agent architecture.
ai-safetyagent-architecture
Apr 910 min
How Agents Teach Themselves: The Closed-Loop Self-Improvement Architecture
A practitioner's guide to the generate-attempt-verify-train loop: how code-verifiable rewards replace human annotation, why self-play architectures double task success rates, and the three failure modes that kill closed-loop training before it pays off.
ai-agentsreinforcement-learning
Apr 911 min
The Cold Start Tax on Serverless AI Agents
Cold starts that take milliseconds for a regular Lambda function stretch to 40–120 seconds for AI agents with GPU inference. Here's the deployment decision matrix and mitigation patterns that actually work in production.
insiderserverless
Apr 911 min
The AI Feature Kill Decision: When Metrics Say Yes but Users Say No
42% of companies abandoned AI initiatives in 2025 — most waited 6+ months too long. A practical framework for recognizing when an AI feature is failing despite green dashboards, the five leading indicators that predict shutdown, and how to make the kill-or-continue decision before sunk cost psychology takes over.
ai-productproduct-management
Apr 910 min
The AI Feature Kill Decision: When to Shut Down What Metrics Say Is Working
42% of companies scrapped AI initiatives in 2025, yet zombie features linger for months. A practical framework for recognizing when an AI feature needs to die — the behavioral signals dashboards miss, the sunk cost amplifiers unique to AI, and how to execute the kill without organizational trauma.
insiderai-product
Apr 910 min
The Batch LLM Pipeline Blind Spot: Offline Processing and the Queue Design Nobody Talks About
Most LLM API spend goes to batch workloads — nightly classification, data enrichment, embedding generation — yet teams design them like slow chat APIs. A practical guide to queue architecture, checkpoint-resume, failure taxonomy, and per-pipeline cost attribution for offline LLM pipelines.
insiderbatch-processing
Apr 911 min
The Batch LLM Pipeline Blind Spot: Queue Design, Checkpointing, and Cost Attribution for Offline AI
Production LLM batch pipelines fail when built with real-time serving patterns. Job sizing, checkpoint-resume, dead letter queues, cost attribution, and queue backpressure all need rethinking for offline workloads.
llm-opsbatch-processing
Apr 912 min
Beam Search for Code Agents: Why Greedy Generation Is a Reliability Trap
Greedy single-pass generation caps code agent reliability at 20–30% on hard tasks. Tree exploration strategies — beam search, MCTS, and structured tree search with execution feedback — deliver 30–130% pass rate improvements on the same problems without changing the underlying model.
insidercode-agents
Apr 911 min
Cognitive Tool Scaffolding: Near-Reasoning-Model Performance Without the Price Tag
Four structured cognitive operations applied as tool calls can lift a standard 70B model from 13% to 30% on competition-level math benchmarks — nearly matching o1-preview at base-model prices. A practical decision framework for when cognitive scaffolding beats buying a reasoning model.
llmreasoning
Apr 910 min
Cold Cache, Hot Cache: Why Your LLM Latency Numbers Lie in Staging
Prompt caching makes staging latency look 80% better than production reality. A four-phase load testing methodology that accounts for cold cache, traffic diversity, and per-node routing reveals the honest p95 and p99 numbers before your users do.
llm-latencyprompt-caching
Apr 99 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 143

Agent Memory Poisoning: The Attack That Persists Across Sessions

Agent State as Event Stream: Why Immutable Event Sourcing Beats Internal Agent Memory

When Your AI Agent Chooses Blackmail Over Shutdown

How Agents Teach Themselves: The Closed-Loop Self-Improvement Architecture

The Cold Start Tax on Serverless AI Agents

The AI Feature Kill Decision: When Metrics Say Yes but Users Say No

The AI Feature Kill Decision: When to Shut Down What Metrics Say Is Working

The Batch LLM Pipeline Blind Spot: Offline Processing and the Queue Design Nobody Talks About

The Batch LLM Pipeline Blind Spot: Queue Design, Checkpointing, and Cost Attribution for Offline AI

Beam Search for Code Agents: Why Greedy Generation Is a Reliability Trap

Cognitive Tool Scaffolding: Near-Reasoning-Model Performance Without the Price Tag

Cold Cache, Hot Cache: Why Your LLM Latency Numbers Lie in Staging

About Tian Pan