Blog

Page 144

12 articles

The Cold Start Problem in AI Personalization
When a new user sends their first message, your AI system has one data point and must make dozens of implicit decisions. Here's the architectural playbook for navigating cold start without building a filter bubble yourself.
insiderpersonalization
Apr 911 min
The Composition Testing Gap: Why Your Agents Pass Every Test but Fail Together
67% of multi-agent system failures stem from inter-agent interactions, not individual defects. A practical guide to property-based invariants, trajectory replay, seam injection, and contract testing for composed agent pipelines.
insidermulti-agent
Apr 99 min
Computer Use Agents in Production: When Pixels Replace API Calls
A production guide to computer use agents — covering the see-think-act loop, coordinate scaling pitfalls, five failure modes that kill deployments, sandboxing requirements, and a decision framework for when pixels beat API calls.
insidercomputer-use
Apr 99 min
Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For
How prompt caches, vector indexes, fine-tuned model weights, and agent memory stores can silently bleed data between tenants in shared LLM products — which isolation primitives actually enforce boundaries, and the audit methodology for finding contamination before a customer does.
insiderai-security
Apr 911 min
DAG-First Agent Orchestration: Why Linear Chains Break at Scale
Linear agent pipelines serialize work that should run in parallel, propagate failures that could be isolated, and make partial recovery structurally impossible. Here is what switching to a DAG-first execution model actually changes.
agent-orchestrationmulti-agent
Apr 910 min
The Debug Tax: Why Debugging AI Systems Takes 10x Longer Than Building Them
Production AI debugging demands 3–8x more engineering time than initial development — driven by non-reproducible failures, semantic errors invisible to traditional monitoring, and prompt regressions that break silently. A practical methodology covering retrieval triage, evaluation hierarchies, statistical pass/fail criteria, and trace-based replay.
insiderllm-debugging
Apr 910 min
Domain-Specialized Agent Architectures: Why Generic Agents Underperform in High-Stakes Verticals
Generic AI agents consistently underperform in medical, legal, and scientific domains. Here are the three architectural patterns — tiered specialist sub-agents, domain-specific tool servers, and curated knowledge injection — that close the gap, plus a decision framework for when specialization overhead is worth it.
ai-agentsarchitecture
Apr 910 min
The Escalation Protocol: Building Agent-to-Human Handoffs That Don't Lose State
Most agent-to-human escalation breaks because teams treat it as an error state, not a designed workflow. A breakdown of the signal stack, state serialization format, oversight interface patterns, and the return path that preserves task continuity.
insiderai-agents
Apr 911 min
The Explainability Trap: When AI Explanations Become a Liability
Post-hoc AI explanations look authoritative but are structurally disconnected from model computation — how this creates regulatory exposure, misdirects users, and what honest explanation architecture actually looks like.
insiderai-engineering
Apr 911 min
Fine-tuning vs. RAG for Knowledge Injection: The Decision Engineers Consistently Get Wrong
Fine-tuning teaches model behavior; RAG injects retrievable facts. Most teams confuse the two and spend months fine-tuning models that needed retrieval all along. Here's the decision framework that separates them.
fine-tuningrag
Apr 910 min
Building GDPR-Ready AI Agents: The Compliance Architecture Decisions That Actually Matter
Four structural conflicts every regulated-industry engineer must resolve before shipping AI agents: right-to-erasure gaps in vector stores, audit trail requirements under the EU AI Act, data residency misconceptions, and the consent model that won't block future expansion.
insidergdpr
Apr 910 min
GPU Memory Math for Multi-Model Serving: Why Most Teams Over-Provision by 3x
KV cache, not model weights, dominates GPU memory under concurrent load. The exact formulas for capacity planning, quantization tradeoffs (AWQ vs GPTQ vs GGUF), and bin-packing strategies that let you serve 4 models on hardware budgeted for 1.
insidergpu-inference
Apr 99 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 144

The Cold Start Problem in AI Personalization

The Composition Testing Gap: Why Your Agents Pass Every Test but Fail Together

Computer Use Agents in Production: When Pixels Replace API Calls

Cross-Tenant Data Leakage in Shared LLM Infrastructure: The Isolation Failures Nobody Tests For

DAG-First Agent Orchestration: Why Linear Chains Break at Scale

The Debug Tax: Why Debugging AI Systems Takes 10x Longer Than Building Them

Domain-Specialized Agent Architectures: Why Generic Agents Underperform in High-Stakes Verticals

The Escalation Protocol: Building Agent-to-Human Handoffs That Don't Lose State

The Explainability Trap: When AI Explanations Become a Liability

Fine-tuning vs. RAG for Knowledge Injection: The Decision Engineers Consistently Get Wrong

Building GDPR-Ready AI Agents: The Compliance Architecture Decisions That Actually Matter

GPU Memory Math for Multi-Model Serving: Why Most Teams Over-Provision by 3x

About Tian Pan