Skip to main content

3 posts tagged with "contracts"

View all tags

The Tool Description That Rotted While Your Agent Kept Calling It

· 10 min read
Tian Pan
Software Engineer

Your agent has been quietly wrong for six months and your error rate looks fine. The underlying API shipped a renamed error code, made one optional field required, and started rejecting calls without an idempotency header. The tool description in your agent's system prompt — pasted from a Notion page in Q4 of last year — describes none of this. The agent keeps calling the old shape, the orchestration layer keeps catching the failure and retrying with the same broken arguments, and the only signal in your telemetry is a slightly elevated retry count that nobody on call has the context to investigate.

Tool descriptions are interface contracts. They age the moment the underlying API does. And unlike a typed SDK, they break silently — the model just makes worse calls.

The AI Procurement Gap: Why Your Vendor Evaluation Process Can't Handle Probabilistic Systems

· 11 min read
Tian Pan
Software Engineer

A procurement team I worked with spent eleven weeks scoring four LLM vendors against a 312-row RFP spreadsheet. They negotiated 99.9% uptime, $0.0008 per 1K input tokens, SOC 2 Type II, and a glossy benchmark PDF that put their selected vendor 2.3 points ahead on MMLU. The contract was signed on a Friday. The following Tuesday, the vendor silently rolled a model update, and the customer-support agent the team had built started routing roughly 14% of refund requests to the wrong queue. The uptime SLA was honored. The benchmark scores were unchanged. The procurement process had functioned exactly as designed, and the system was still broken.

This is the AI procurement gap. The instruments enterprise procurement uses to manage software risk — feature checklists, uptime guarantees, security questionnaires, sample benchmarks — were built for systems whose outputs are reproducible. None of those instruments measure the thing that actually determines whether an AI vendor will keep working for you: the behavioral stability of a stochastic surface that the vendor controls and you do not.

The Warranty Problem: Who Pays When Your AI Feature Is Wrong?

· 9 min read
Tian Pan
Software Engineer

Every software warranty ever written assumed deterministic behavior. You ship a function, it returns the same output for the same input, and your warranty covers the gap between documented behavior and actual behavior. AI features shatter that assumption entirely.

When your LLM-powered feature tells a customer something wrong — and that wrong thing costs them money — traditional warranty language leaves everyone pointing fingers at everyone else.

This is not hypothetical. Cumulative generative AI lawsuits in the U.S. passed 700 between 2020 and 2025, with year-over-year filings accelerating by 137%. The legal infrastructure governing software liability was built for a deterministic world, and the mismatch is already causing real damage.