Blog

Page 108

12 articles

Red-Teaming Consumer LLM Features: Finding Injection Surfaces Before Your Users Do
Consumer-facing LLM features face attack surfaces that internal agents never see. A practical guide to injection vectors, jailbreak patterns at scale, model inversion risks, and the systematic hardening playbook for production AI.
llmsecurity
Apr 189 min
Retrieval Monoculture: Why Your RAG System Has Systematic Blind Spots
When all queries funnel through a single embedding space, structurally different query types converge on the same systematic misses. Here's how to audit your retrieval diversity and fix it without blowing your latency budget.
insiderrag
Apr 1810 min
Sandboxing Agents That Can Write Code: Least Privilege Is Not Optional
API key scoping is not enough. When your AI agent can execute code, you need container isolation, filesystem namespacing, egress controls, and a capability audit process — or you're one prompt injection away from a lateral movement incident.
insiderai-engineering
Apr 1812 min
Serving AI at the Edge: A Decision Framework for Moving Inference Out of the Cloud
A practical decision framework for engineers deciding when to move LLM inference to the edge: latency thresholds, cost break-even analysis, the quantization quality tax, and split-inference architectures.
edge-aion-device-inference
Apr 1810 min
Shadow Traffic for AI Systems: The Safest Way to Validate Model Changes Before They Ship
How to use production traffic replay to validate LLM model and prompt changes before they affect users — the infrastructure, metrics, and sampling strategies that give you confidence at a fraction of A/B test cost.
insiderai-engineering
Apr 1810 min
The Shared Prompt Service Problem: Multi-Team LLM Platforms and the Dependency Nightmare
When five teams share one AI service, a single system prompt change silently breaks four evals. Here's the dependency management framework that prevents it.
insiderllm
Apr 1810 min
The Skill Atrophy Trap: How AI Assistance Silently Erodes the Engineers Who Use It Most
Research shows AI coding assistance can lower comprehension scores by 17% and make experienced developers 19% slower while they feel 20% faster. Here's why mid-career engineers are most at risk and what to do about it.
insiderai-engineering
Apr 1810 min
SLOs for Non-Deterministic AI Features: Setting Error Budgets When Wrong Is Probabilistic
Standard availability and error-rate SLOs don't capture behavioral quality degradation in LLM features. Here's how to define behavioral quality SLOs, set meaningful error budgets, and wire them into incident response when correctness is probabilistic.
ai-engineeringsre
Apr 1810 min
Specification Gaming in Production LLM Systems: When Your AI Does Exactly What You Asked
Specification gaming isn't just an RL theory problem — it shows up in every production LLM system where incentive gradients exist. Here's how to find it and build systems that are harder to game.
ai-engineeringllm
Apr 1810 min
SRE for AI Agents: What Actually Breaks at 3am
Traditional SRE runbooks don't cover AI agent failure modes. Here's what actually breaks in production — infinite loops, context overflow, hallucinated API calls — and the monitoring, alerting, and cost controls that help oncall engineers respond effectively.
insiderai-engineering
Apr 1810 min
SSE vs WebSockets vs gRPC Streaming for LLM Apps: The Protocol Decision That Bites You Later
How SSE, WebSockets, and gRPC streaming fail differently under backpressure, what browser constraints and edge proxies break in production, and the failure-mode profile that should drive your transport choice.
llmstreaming
Apr 1811 min
Stateful Multi-Turn Conversation Infrastructure: Beyond Passing the Full History
Why 'pass the full conversation history' fails at p99 scale, and the session store designs, compression strategies, and operational patterns that actually hold up in production.
insiderai-engineering
Apr 1811 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 108

Red-Teaming Consumer LLM Features: Finding Injection Surfaces Before Your Users Do

Retrieval Monoculture: Why Your RAG System Has Systematic Blind Spots

Sandboxing Agents That Can Write Code: Least Privilege Is Not Optional

Serving AI at the Edge: A Decision Framework for Moving Inference Out of the Cloud

Shadow Traffic for AI Systems: The Safest Way to Validate Model Changes Before They Ship

The Shared Prompt Service Problem: Multi-Team LLM Platforms and the Dependency Nightmare

The Skill Atrophy Trap: How AI Assistance Silently Erodes the Engineers Who Use It Most

SLOs for Non-Deterministic AI Features: Setting Error Budgets When Wrong Is Probabilistic

Specification Gaming in Production LLM Systems: When Your AI Does Exactly What You Asked

SRE for AI Agents: What Actually Breaks at 3am

SSE vs WebSockets vs gRPC Streaming for LLM Apps: The Protocol Decision That Bites You Later

Stateful Multi-Turn Conversation Infrastructure: Beyond Passing the Full History

About Tian Pan