Skip to main content

578 posts tagged with "insider"

View all tags

When the Prompt Engineer Leaves: The AI Knowledge Transfer Problem

· 9 min read
Tian Pan
Software Engineer

Six months after your best prompt engineer rotates off to a new project, a customer-facing AI feature starts misbehaving. Response quality has degraded, the output format occasionally breaks, and there's a subtle but persistent tone problem you can't quite name. You open the prompt file. It's 800 words of natural language. There's no changelog, no comments, no test cases. The person who wrote it knew exactly why every phrase was there. That knowledge is gone.

This is the prompt archaeology problem, and it's already costing teams real money. A national mortgage lender recently traced an 18% accuracy drop in document classification to a single sentence added to a prompt three weeks earlier during what someone labeled "routine workflow optimization." Two weeks of investigation, approximately $340,000 in operational losses. The author of that change had already moved on.

Corpus Curation at Scale: Why Your RAG Quality Ceiling Is Your Document Quality Floor

· 10 min read
Tian Pan
Software Engineer

There's a belief embedded in most RAG architectures that goes something like this: if retrieval returns the right chunks, the LLM will produce correct answers. Teams invest heavily in embedding model selection, hybrid retrieval strategies, and reranking pipelines. Then, three months after deploying to production, answer quality quietly degrades — not because the model changed, not because query patterns shifted dramatically, but because the underlying corpus rotted.

Enterprise RAG implementations fail at a roughly 40% rate, and the failure mode that practitioners underestimate most isn't hallucination or poor retrieval recall. It's document quality. One analysis found that a single implementation improved search accuracy from 62% to 89% by introducing document quality scoring — with no changes to the embedding model or retrieval algorithm. The corpus was the variable. The corpus was always the variable.

Data Provenance for AI Systems: Why Tracking Answer Origins Is Now an Engineering Requirement

· 10 min read
Tian Pan
Software Engineer

A production LLM answers a user's question incorrectly. A support ticket arrives. You pull the logs. They show the prompt, the completion, and the latency — but nothing about which documents the retrieval system surfaced, which chunks landed in the context window, or which passage the model leaned on most heavily when it synthesized the answer. You're left doing archaeology: re-running the query against a corpus that has since been updated, hoping the same results come back, wondering if the bug is in retrieval, in chunking, in the document itself, or in the model's reasoning.

This is the data provenance gap, and most AI teams don't notice it until they're already in it.

GPU Scheduling for Mixed LLM Workloads: The Bin-Packing Problem Nobody Solves Well

· 10 min read
Tian Pan
Software Engineer

Most GPU clusters running LLM inference are wasting between 30% and 50% of their available compute. Not because engineers are careless, but because the scheduling problem is genuinely hard—and the tools most teams reach for first were never designed for it.

The standard approach is to stand up Kubernetes, request whole GPUs per pod, and let the scheduler figure it out. This works fine for training jobs. For inference across a heterogeneous set of models, it quietly destroys utilization. A cluster running three different 7B models with sporadic traffic will find each GPU busy less than 15% of the time, while remaining fully "allocated" and refusing to schedule new work.

The root cause is a mismatch between how Kubernetes thinks about GPUs and what LLM inference actually requires.

The Institutional Knowledge Drain: How AI Agents Absorb Decisions Without Transferring Understanding

· 10 min read
Tian Pan
Software Engineer

Three months after a fintech team rolled out an AI coding agent to handle their routine backend tasks, a senior engineer left for another company. When the team tried to reconstruct why certain authentication decisions had been made six weeks earlier, nobody could. The PR descriptions said "implemented as discussed." The commit messages said "per requirements." The AI agent had made the choices, the code worked, and the reasoning had evaporated.

This is not a documentation failure. It is what happens when the channel through which understanding normally flows — the back-and-forth between engineers, the friction of explanation, the pressure of justifying a decision to another human — is replaced by a system that optimizes for output rather than comprehension.

Why Your Database Melts When AI Features Ship: LLM-Aware Connection Pool Design

· 9 min read
Tian Pan
Software Engineer

Your connection pool was fine until someone shipped the AI feature. Login works, dashboards load, CRUD operations hum along at single-digit millisecond latencies. Then the team deploys a RAG-powered search, an agent-driven workflow, or an LLM-backed summarization endpoint — and within hours, your core product starts timing out. The database didn't get slower. Your pool just got eaten alive by a workload it was never designed to handle.

This is the LLM connection pool problem, and it's hitting teams across the industry as AI features move from prototype to production. The fix isn't "just add more connections." In fact, that usually makes things worse.

Machine-Readable Project Context: Why Your CLAUDE.md Matters More Than Your Model

· 8 min read
Tian Pan
Software Engineer

Most teams that adopt AI coding agents spend the first week arguing about which model to use. They benchmark Opus vs. Sonnet vs. GPT-4o on contrived examples, obsess over the leaderboard, and eventually pick something. Then they spend the next three months wondering why the agent keeps rebuilding the wrong abstractions, ignoring their test strategy, and repeatedly asking which package manager to use.

The model wasn't the problem. The context file was.

Every AI coding tool — Claude Code, Cursor, GitHub Copilot, Windsurf — reads a project-specific markdown file at the start of each session. These files go by different names: CLAUDE.md, .cursor/rules/, .github/copilot-instructions.md, AGENTS.md. But they share the same purpose: teaching the agent what it cannot infer from reading the code alone. The quality of this file now predicts output quality more reliably than the model behind it. Yet most teams write them once, badly, and never touch them again.

MCP Is the New Microservices: The AI Tool Ecosystem Is Repeating Distributed Systems Mistakes

· 8 min read
Tian Pan
Software Engineer

If you lived through the microservices explosion of 2015–2018, the current state of MCP should feel uncomfortably familiar. A genuinely useful protocol appears. It's easy to spin up. Every team spins one up. Nobody tracks what's running, who owns it, or how it's secured. Within eighteen months, you're staring at a dependency graph that engineers privately call "the Death Star."

The Model Context Protocol is following the same trajectory, at roughly three times the speed. Unofficial registries already index over 16,000 MCP servers. GitHub hosts north of 20,000 public repositories implementing them. And Gartner is predicting that 40% of agentic AI projects will fail by 2027 — not because the technology doesn't work, but because organizations are automating broken processes. MCP sprawl is a symptom of exactly that problem.

Measuring Real AI Coding Productivity: The Metrics That Survive the 90-Day Lag

· 9 min read
Tian Pan
Software Engineer

Most teams adopting AI coding tools hit the same wall. Month one looks like a success story: PR throughput is up, sprint velocity is climbing, and the engineering manager is putting together a slide deck to share with leadership. By month three, something has quietly gone wrong. Incidents creep up. Senior engineers are spending more time in review. A simple bug fix now requires understanding code nobody on the team actually wrote. The productivity gains have evaporated — but the measurement system never caught it.

The problem is that the metrics most teams reach for first — lines generated, PRs merged, story points burned — are the wrong unit of measurement for AI-assisted development. They measure the cost of producing code, not the cost of owning it. And AI has made production nearly free while leaving ownership costs untouched.

When Your Database Migration Breaks Your AI Agent's World Model

· 9 min read
Tian Pan
Software Engineer

Your team ships a routine database migration on Tuesday — renaming last_login_date to last_activity_ts and expanding its semantics to include API calls. No service breaks. Tests pass. Dashboards update. But your AI agent, the one answering customer questions about user engagement, silently starts generating wrong answers. No error, no alert, no stack trace. It just confidently reasons over a world that no longer exists.

This is the schema migration problem that almost nobody in AI engineering has mapped. Your agent builds an implicit model of your data from tool descriptions, few-shot examples, and retrieval context. When the underlying schema changes, that model becomes a lie — and the agent has no mechanism to detect the contradiction.

The Ambient AI Coherence Problem: When Every Feature Is AI-Powered, Nothing Feels Like One Product

· 9 min read
Tian Pan
Software Engineer

Most AI products get the individual features right and the product wrong. Search returns plausible results. The summary is coherent. The chat assistant gives reasonable advice. But when a user searches for "best plan for small teams," gets a recommendation in the sidebar, asks the assistant a follow-up question, and then reads an auto-generated summary of their options — and all four contradict each other — none of the features feel trustworthy anymore. This is the ambient AI coherence problem: not hallucination in isolation, but contradiction at the product level.

The failure mode is subtle enough that teams often miss it entirely. Individual feature evals look fine. The search team measures recall and precision. The summarization team measures faithfulness. The chat team measures task completion. Nobody measures whether the AI-powered features of the product tell the same story about the same facts.

The Enterprise API Impedance Mismatch: Why Your AI Agent Wastes 60% of Its Tokens Before Doing Anything Useful

· 8 min read
Tian Pan
Software Engineer

Your AI agent is brilliant at reasoning, planning, and generating natural language. Then you point it at your enterprise SAP endpoint and it spends 4,000 tokens trying to understand a SOAP envelope. Welcome to the impedance mismatch — the quiet tax that turns every enterprise AI integration into a token bonfire.

The mismatch isn't just about XML versus JSON. It's a fundamental collision between how LLMs think — natural language, flat key-value structures, concise context — and how enterprise systems communicate: deeply nested schemas, implementation-specific naming, pagination cursors, and decades of accumulated protocol conventions. Unlike a human developer who reads WSDL documentation once and moves on, your agent re-parses that complexity on every single invocation.