Blog

Page 52

12 articles

Token Budgets Are the New Internal IAM
When your AI bill crosses seven figures, token quota stops being a finance number and starts behaving like an authorization surface. Why allocation needs IAM-style discipline, not dashboard sliders.
llmopsfinops
Apr 2611 min
Tokenizer Churn: The Silent Breaking Change Inside Your 'Compatible' Model Upgrade
A vendor model bump can leave the API byte-stable while quietly swapping the tokenizer underneath — silently breaking context budgets, stop sequences, and few-shot prompts. Here is how to audit, pin, and survive tokenizer churn.
insiderllm
Apr 2611 min
The Expensive-to-Undo Tool Taxonomy: One Approval Gate Per Risk Class
Binary tool approval breaks under load: a single confirm dialog cannot gate a draft save and an outbound payment without training users to click through both. A six-class risk taxonomy fixes the conflation.
ai-agentssecurity
Apr 269 min
Your Tool Catalog Is a Power Law and You're Optimizing the Long Tail
Production tool usage follows a power law, but most agent frameworks treat the catalog as flat — and pay for it in token bloat, accuracy collapse past 100 tools, and silent long-tail regressions. A field guide to hot/cold partitioning.
insiderai-agents
Apr 2611 min
Tool-Composition Privilege Escalation: Your Security Review Cleared the Nodes, Not the Edges
Per-tool security review clears nodes, but agents run trajectories. The composition graph of an agent's tool catalog is a permission set the security team never enumerated, and confused-deputy exploits live on the edges.
insiderai-agents
Apr 2610 min
Trust Ceilings: The Autonomy Variable Your Product Team Can't See
AI agents stall at the autonomy ceiling — the level above which users start checking, intervening, or abandoning the feature. Treat it as a measurable product variable, not a model problem.
agentic-aiproduct-design
Apr 2610 min
Abstain or Escalate: The Two-Threshold Problem in Confidence-Gated AI
A single confidence threshold collapses two distinct decisions — abstain and escalate — into one number, and that compromise is why your trust metric keeps sliding even when accuracy looks fine.
insiderllm
Apr 2613 min
The Third Copy: Vector Stores, Deletion Completeness, and the GDPR Gap RAG Teams Keep Missing
When a user invokes their right to erasure, deleting the source text doesn't delete the embedding. Most teams never modeled the vector store as a third copy of user data — and the inversion-attack literature says they should have.
securityprivacy
Apr 2611 min
The Vendor-Portability Tax: Why 'We Can Swap Models' Is a Quarterly Cost Line, Not a Checkbox
Behavioral portability across LLM providers decays the moment you stop funding it. A breakdown of the quarterly burn rate — eval subscriptions, prompt-as-function-of-model routing, contract leverage — that turns 'we can swap models' from a slide into a real option.
llmvendor-strategy
Apr 2611 min
Your Provider's 99.9% SLA Is Measured at the Wrong Boundary for Your Agent
A vendor's 99.9% availability is measured per call; your agent makes 12 per task. The arithmetic, the missing contract clauses, and the divergence alarm that catches incidents before users do.
insidersla
Apr 2611 min
Why Your Voice Agent Feels Rude: Turn-Taking Is a Latency Budget You Never Wrote Down
Why voice agents feel rude: a four-stage latency budget, hybrid turn detection, full-duplex audio, and a preemption contract that protects state.
ai-engineeringvoice-ai
Apr 2611 min
Your Agent's Outbox Is Your Next Deliverability Incident
An agent fans out 80,000 emails before breakfast and the password-reset domain reputation is gone for six weeks. The subdomain, DKIM, and rate-limit discipline you need before the first send.
insideragents
Apr 2511 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 52

Token Budgets Are the New Internal IAM

Tokenizer Churn: The Silent Breaking Change Inside Your 'Compatible' Model Upgrade

The Expensive-to-Undo Tool Taxonomy: One Approval Gate Per Risk Class

Your Tool Catalog Is a Power Law and You're Optimizing the Long Tail

Tool-Composition Privilege Escalation: Your Security Review Cleared the Nodes, Not the Edges

Trust Ceilings: The Autonomy Variable Your Product Team Can't See

Abstain or Escalate: The Two-Threshold Problem in Confidence-Gated AI

The Third Copy: Vector Stores, Deletion Completeness, and the GDPR Gap RAG Teams Keep Missing

The Vendor-Portability Tax: Why 'We Can Swap Models' Is a Quarterly Cost Line, Not a Checkbox

Your Provider's 99.9% SLA Is Measured at the Wrong Boundary for Your Agent

Why Your Voice Agent Feels Rude: Turn-Taking Is a Latency Budget You Never Wrote Down

Your Agent's Outbox Is Your Next Deliverability Incident

About Tian Pan