One post tagged with "capability-audit"

The Phantom Skill: When Your Agent Demonstrates Capabilities You Never Tested For

May 9, 2026 · 11 min read

Software Engineer

A customer posts a screenshot in your support channel. They've been using your scheduling agent to negotiate three-way meeting times across timezones in mixed English and Japanese, with the agent producing suggested slots in both languages and reasoning about Japanese business etiquette. It works. Leadership shares it on Slack with a fire emoji. The PM updates the marketing copy.

Nobody on the team wrote that capability. No eval covers it. No prompt instruction mentions Japanese, etiquette, or three-way coordination. The behavior is real, but it was never engineered, never measured, and is now in your product surface area.

This is a phantom skill: a capability your agent demonstrates that no test ever verified. It isn't a bug. It isn't quite a feature either. It's load-bearing behavior with no contract, and it's the failure mode that quietly defines what your "AI product" actually is.

About Tian Pan