Blog

Page 118

12 articles

The Minimal Footprint Principle: Least Privilege for Autonomous AI Agents
Designing autonomous AI agents that request only the permissions the current task requires—applying Unix least-privilege to agentic systems through ephemeral credentials, intent-aware access provisioning, and isolated execution.
insiderai-agents
Apr 1610 min
The Compression Decision: Quantization, Distillation, and On-Device Inference for Latency-Critical AI Features
When model routing isn't enough and you need sub-100ms response times, you face a hard compression decision. Here's how to navigate quantization, distillation, and hybrid edge-cloud deployment without destroying quality on the tasks that matter.
ai-engineeringllm
Apr 1610 min
Multi-Region LLM Serving: The Cache Locality Problem Nobody Warns You About
Deploying LLM inference across regions creates consistency and latency problems that stateless HTTP services don't have. Here's the routing architecture that handles it without tripling your ops burden.
insiderllm
Apr 1610 min
The Multi-Tenant LLM Problem: Noisy Neighbors, Isolation, and Fairness at Scale
When thousands of users share the same model and vector index, one expensive session degrades everyone else. Here's why multi-tenant LLM infrastructure is harder than databases — and how to build fairness in.
insiderinfrastructure
Apr 1612 min
The Multi-Turn Session State Collapse Problem
Why single-turn LLM failures are easy to catch while multi-turn session state silently corrupts across 10+ turns — and the checkpoint, compression, and monitoring patterns that prevent the 'AI forgot who I am' failure mode.
insiderai-engineering
Apr 1610 min
Multi-User Shared AI Sessions: The Concurrency Problem Nobody Has Solved
When multiple users share a single AI context simultaneously, standard distributed systems assumptions break down. Here's why multi-user AI sessions are architecturally hard and what production teams have built to address it.
ai-engineeringdistributed-systems
Apr 1612 min
On-Call for AI Systems: Incident Response When the Bug Is the Model
Standard on-call runbooks break when the failure is non-deterministic model behavior. A practical framework for detecting, triaging, and containing AI incidents — from guardrail bypass to cost explosions — with playbooks built for engineers, not ML researchers.
llmopsincident-response
Apr 1611 min
The On-Call Runbook for AI Systems That Nobody Writes
Traditional SRE runbooks break down when the failure mode is probabilistic model behavior, not a crashed service. Here's what incident response actually looks like for LLM-powered systems, and the signals worth alarming on.
insiderai-engineering
Apr 1610 min
On-Device LLM Inference: When to Move AI Off the Cloud
A practical decision framework for when on-device LLM inference beats cloud APIs — covering privacy requirements, cost math, quality tradeoffs, and the deployment problems nobody warns you about.
edge-aillm
Apr 1611 min
Onboarding Engineers into AI-Generated Codebases Without Breaking How They Learn
AI coding tools ship features faster but silently erode the code-reading that builds system intuition in new engineers. Here's how to restore learning without slowing delivery.
insiderengineering-leadership
Apr 169 min
The Pilot Graveyard: Why Enterprise AI Rollouts Fail After the Demo
88% of enterprise AI pilots never reach production. The problem isn't the model — it's everything that happens after the demo. A practitioner's breakdown of why compelling POCs die at 12% WAU and how to fix it.
insiderai-engineering
Apr 1610 min
Post-Training Alignment for Product Engineers: What RLHF, DPO, and RLAIF Actually Mean for You
RLHF, DPO, and RLAIF aren't just research acronyms — they determine whether the user feedback you're logging today becomes a training asset or stays noise. Here's what product engineers need to know.
ai-engineeringllm
Apr 1611 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 118

The Minimal Footprint Principle: Least Privilege for Autonomous AI Agents

The Compression Decision: Quantization, Distillation, and On-Device Inference for Latency-Critical AI Features

Multi-Region LLM Serving: The Cache Locality Problem Nobody Warns You About

The Multi-Tenant LLM Problem: Noisy Neighbors, Isolation, and Fairness at Scale

The Multi-Turn Session State Collapse Problem

Multi-User Shared AI Sessions: The Concurrency Problem Nobody Has Solved

On-Call for AI Systems: Incident Response When the Bug Is the Model

The On-Call Runbook for AI Systems That Nobody Writes

On-Device LLM Inference: When to Move AI Off the Cloud

Onboarding Engineers into AI-Generated Codebases Without Breaking How They Learn

The Pilot Graveyard: Why Enterprise AI Rollouts Fail After the Demo

Post-Training Alignment for Product Engineers: What RLHF, DPO, and RLAIF Actually Mean for You

About Tian Pan