780 posts tagged with "ai-engineering"

Compound AI Systems: When Your Pipeline Is Smarter Than Any Single Model

April 19, 2026 · 9 min read

Software Engineer

There is a persistent assumption in AI engineering that the path to better outputs is a better model. Bigger context window, fresher training data, higher benchmark scores. In practice, the teams shipping the most capable AI products are usually doing something different: they are assembling pipelines where multiple specialized components — a retriever, a reranker, a classifier, a code interpreter, and one or more language models — cooperate to handle a task that no single model could do reliably on its own.

This architectural pattern has a name — compound AI systems — and it is now the dominant paradigm for production AI. Understanding how to build these systems correctly, and where they fail when you don't, is one of the most important skills in applied AI engineering today.

The Context Window Cliff: Application-Level Strategies for Long Conversations

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

A 90-minute support session. A research assistant that's been browsing documents for an hour. A coding agent that's touched a dozen files. All of these eventually hit the same wall — and when they do, they don't crash loudly. They get dumb.

The model starts forgetting what was decided twenty minutes ago. It contradicts itself. Retrieval results that should be obvious go missing. Users notice something is off but can't articulate why the assistant got worse. This is the context window cliff: not a hard error, but a gradual quality collapse that your monitoring almost certainly doesn't measure.

Expanding the context window doesn't fix this. Models with million-token windows still degrade on content in the middle, and even when they don't, you're paying for 100x more tokens while the model attends to a fraction of them. The solution is application-level context management — deliberate strategies for what stays in the window, what gets summarized, and what lives outside it entirely.

Continuous Deployment for AI Models: Your Rollback Signal Is Wrong

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Your deployment pipeline is green. Latency is nominal. Error rate: 0.02%. The new model version shipped successfully — or so your dashboard says.

Meanwhile, your customer-facing AI is subtly summarizing documents with less precision, hedging on questions it used to answer directly, and occasionally flattening the structured outputs your downstream pipeline depends on. No alerts fire. No on-call page triggers. The first signal you get is a support ticket, two weeks later.

This is the silent regression problem in AI deployments. Traditional rollback signals — HTTP errors, p99 latency, exception rates — are built for deterministic software. They cannot see behavioral drift. And as teams upgrade language models more frequently, the gap between "infrastructure is healthy" and "AI is working correctly" becomes a place where regressions hide.

The Conversation Designer's Hidden Role in AI Product Quality

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Most engineering teams treat system prompts as configuration files — technical strings to be iterated on quickly, stored in environment variables, and deployed with the same ceremony as changing a timeout value. The system prompt gets an inline comment. The error messages get none. The capability disclosure is whatever the PM typed into the Notion doc on launch day.

This is the root cause of an entire class of AI product failures that don't show up in your eval suite. The model answers the question. The latency is fine. The JSON validates. But users stop trusting the product after three sessions, and the weekly active usage curve never recovers.

The missing discipline is conversation design. And it shapes output quality in ways that most engineering instrumentation is architecturally blind to.

The AI Feature Sunset Playbook: Decommissioning Agents Without Breaking Your Users

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Most teams discover the same thing at the worst possible time: retiring an AI feature is nothing like deprecating an API. You add a sunset date to the docs, send the usual three-email sequence, flip the flag — and then watch your support queue spike 80% while users loudly explain that the replacement "doesn't work the same way." What they mean is: the old agent's quirks, its specific failure modes, its particular brand of wrong answer, had all become load-bearing. They'd built workflows around behavior they couldn't name until it was gone.

This is the core problem with AI feature deprecation. Deterministic APIs have explicit contracts. If you remove an endpoint, every caller that relied on it gets a 404. The breakage is traceable, finite, and predictable. Probabilistic AI outputs are different — users don't integrate the contract, they integrate the behavioral distribution. Removing a model doesn't just remove a capability; it removes a specific pattern of behavior that users may have spent months adapting to without realizing it.

Designing for Partial Completion: When Your Agent Gets 70% Done and Stops

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Every production agent system eventually ships a failure nobody anticipated: the agent that books the flight, fails to find a hotel, and leaves a user with half a confirmed itinerary and no clear way to finish. Not a crash. Not a refusal. Just a stopped agent with real-world side effects and no plan for what comes next.

The standard mental model for agent failure is binary — succeed or abort. Retry logic, exponential backoff, fallback prompts — all of these assume a clean boundary between "task running" and "task done." But real agents fail somewhere in the middle, and when they do, the absence of partial-completion design becomes the bug. You didn't need a smarter model. You needed a task state machine.

Dev/Prod Parity for AI Apps: The Seven Ways Your Staging Environment Is Lying to You

April 19, 2026 · 11 min read

Tian Pan

Software Engineer

The 12-Factor App doctrine made dev/prod parity famous: keep development, staging, and production as similar as possible. For traditional web services, this is mostly achievable. For LLM applications, it is structurally impossible — and the gap is far larger than most teams realize.

The problem is not that developers are careless. It is that LLM applications depend on a class of infrastructure (cached computation, living model weights, evolving vector indexes, and stochastic generation) where the differences between staging and production are not merely inconvenient but categorically different in kind. A staging environment that looks correct will lie to you in at least seven specific ways.

The EU AI Act Is Now Your Engineering Backlog

April 19, 2026 · 12 min read

Tian Pan

Software Engineer

Most engineering teams discovered the GDPR through a legal email that arrived three weeks before the deadline. The EU AI Act is repeating that pattern, and the August 2, 2026 enforcement date for high-risk AI systems is close enough that "we'll deal with compliance later" is no longer an option. The difference between GDPR and the AI Act is that GDPR compliance was mostly about data handling policies. AI Act compliance requires building new system components — components that don't exist yet in most production AI systems.

What the regulation calls "human oversight obligations" and "audit trail requirements" are, translated into engineering language, a dashboard, an event log, and a data lineage system. This article treats the EU AI Act as an engineering specification rather than a legal document and walks through what you actually need to build.

The EU AI Act Features That Silently Trigger High-Risk Compliance — and What You Must Ship Before August 2026

April 19, 2026 · 9 min read

Tian Pan

Software Engineer

An appliedAI study of 106 enterprise AI systems found that 40% had unclear risk classifications. That number is not a reflection of regulatory complexity — it is a reflection of how many engineering teams shipped AI features without asking whether the feature changes their compliance tier. The EU AI Act has a hard enforcement date of August 2, 2026 for high-risk systems. At that point, being in the 40% is not a management problem. It is an architecture problem you will be fixing at four times the original cost, under deadline pressure, with regulators watching.

This article is not a legal overview. It is an engineering read on the specific product decisions that silently trigger high-risk classification, the concrete deliverables those classifications require, and why the retrofit path is so much more expensive than the build-it-in path.

Evaluating AI Service Vendors Beyond Your LLM Provider

April 19, 2026 · 10 min read

Tian Pan

Software Engineer

Most engineering teams spend weeks evaluating LLM providers—benchmarking latency, testing accuracy, negotiating pricing. Then they pick an observability tool, a guardrail vendor, and an embedding provider in an afternoon, on the basis of a well-designed landing page and a favorable blog post. The asymmetry is backwards. Your LLM provider is probably a well-capitalized company with stable APIs. The niche vendors surrounding it often are not.

The AI service ecosystem has exploded into dozens of categories: guardrail vendors, embedding providers, observability and tracing tools, fine-tuning platforms, evaluation frameworks. Each category has ten startups competing for the same enterprise budgets. Some will be acquired. More will shut down. A few will pivot and deprecate your critical workflow with a 90-day notice email. Building on this ecosystem without rigorous evaluation is a form of technical debt that doesn't show up in your backlog until it's already a production incident.

Foundation Model Vendor Strategy: What Enterprise SLAs Actually Guarantee

April 19, 2026 · 12 min read

Tian Pan

Software Engineer

Enterprise teams pick LLM vendors based on benchmarks and demos. Then they hit production and discover what the SLA actually says — which is usually much less than they assumed. The 99.9% uptime guarantee you negotiated doesn't cover latency. The data processing agreement your legal team signed doesn't prohibit training on your inputs unless you explicitly added that clause. And the vendor concentration risk that nobody quantified becomes painfully obvious when your core product is down for four hours because a telemetry deployment cascaded through a Kubernetes control plane.

This is not a procurement problem. It's an engineering problem that procurement can't solve alone. The people who build AI systems need to understand what these contracts actually say — and what they don't.

The Idempotency Problem in Agentic Tool Calling

April 19, 2026 · 11 min read

Tian Pan

Software Engineer

The scenario plays out the same way every time. Your agent is booking a hotel room, and a network timeout occurs right after the payment API call returns 200 but before the confirmation is stored. The agent framework retries. The payment runs again. The customer is charged twice, support escalates, and someone senior says the AI "hallucinated a double charge" — which is wrong but feels right because nobody wants to say their retry logic was broken from the start.

This isn't an AI problem. It's a distributed systems problem that the AI layer imported wholesale, without the decades of hard-won patterns that distributed systems engineers developed to handle it. Standard agent retry logic assumes operations are idempotent. Most tool calls are not.

About Tian Pan