Skip to main content

15 posts tagged with "human-in-the-loop"

View all tags

The Escalation Protocol: Building Agent-to-Human Handoffs That Don't Lose State

· 11 min read
Tian Pan
Software Engineer

When a support agent receives an AI-to-human handoff with a raw chat transcript, the average time to prepare for resolution is 15 minutes. The agent has to find the customer in the CRM, look up the relevant order, calculate purchase dates, and reconstruct what the AI already determined. When the same handoff arrives as a structured payload — action history, retrieved data, the exact ambiguity that triggered escalation — that prep time drops to 30 seconds.

That 97% reduction in manual work isn't an edge case. It's the difference between escalation protocols that actually support human oversight and ones that just dump context onto whoever happens to be on shift.

Designing Approval Gates for Autonomous AI Agents

· 10 min read
Tian Pan
Software Engineer

Most agent failures aren't explosions. They're quiet. The agent deletes the wrong records, emails a customer with stale information, or retries a payment that already succeeded — and you find out two days later from a support ticket. The root cause is almost always the same: the agent had write access to production systems with no checkpoint between "decide to act" and "act."

Approval gates are the engineering answer to this. Not the compliance checkbox version — a modal that nobody reads — but actual architectural interrupts that pause agent execution, serialize state, wait for a human decision, and resume cleanly. Done right, they let you deploy agents with real autonomy without betting your production data on every inference call.

Measuring AI Agent Autonomy in Production: What the Data Actually Shows

· 7 min read
Tian Pan
Software Engineer

Most teams building AI agents spend weeks on pre-deployment evals and almost nothing on measuring what their agents actually do in production. That's backwards. The metrics that matter—how long agents run unsupervised, how often they ask for help, how much risk they take on—only emerge at runtime, across thousands of real sessions. Without measuring these, you're flying blind.

A large-scale study of production agent behavior across thousands of deployments and software engineering sessions has surfaced some genuinely counterintuitive findings. The picture that emerges is not the one most builders expect.