47 posts tagged with "compliance"

Why Your Application Logs Can't Reconstruct an AI Decision

May 5, 2026 · 11 min read

Software Engineer

An AI system flags a job application as low-priority. The candidate appeals. Legal asks engineering: "Show us exactly what the model saw, which documents it retrieved, which policy rules fired, and what confidence score it produced." Engineering opens the logs and finds: a timestamp, an HTTP 200, a response body, and a latency metric. The rest is gone.

This is not a logging failure. The logs are complete by every traditional measure. The problem is that application logs were never designed to record reasoning — and AI systems don't just execute code, they make context-dependent probabilistic decisions that can only be understood given the full input context that existed at decision time.

Data-Sensitivity-Tier Model Routing: Governing Which Model Sees Which Data

May 5, 2026 · 11 min read

Tian Pan

Software Engineer

Your AI system routed a patient query to a self-hosted model at 9 AM. At 11 AM, that model's pod restarted during a deployment. The request queue backed up, the router detected a timeout, and it fell back to the cloud LLM you use for generic queries. The query completed successfully. No alerts fired. Your monitoring dashboard showed green. Somewhere in that exchange, protected health information traveled to a vendor with whom you have no Business Associate Agreement.

That's not a hypothetical. It's the default behavior of nearly every AI routing stack that wasn't explicitly designed to prevent it.

The Stakeholder Explanation Layer: Building AI Transparency That Regulators and Executives Actually Accept

May 5, 2026 · 12 min read

Tian Pan

Software Engineer

When legal asks "why did the AI deny this loan application?", your chain-of-thought trace is the wrong answer. It doesn't matter that you have 1,200 tokens of step-by-step reasoning. What they need is a sentence that holds up in a deposition — and right now, most engineering teams have no idea how to produce it.

This is the stakeholder explanation gap: the distance between what engineers understand about model behavior and what regulators, executives, and legal teams need to do their jobs. Closing it requires a distinct architectural layer — one that most production AI systems never build.

Multi-Region AI Deployment: Data Residency, Model Parity, and the Latency Tax Nobody Budgets

May 3, 2026 · 10 min read

Tian Pan

Software Engineer

When engineers budget for multi-region AI deployments, they typically account for two variables: infrastructure cost per region and replication overhead. What they consistently underestimate — sometimes catastrophically — are three costs that only appear once you're live: model parity gaps that make your EU cluster produce different outputs than your US cluster, KV cache isolation penalties that make every token in GDPR territory more expensive to generate, and silent compliance violations that trigger when your retry logic routes a French user's data through Virginia.

A German bank spent 14 months deploying a large open-source model on-premises to satisfy GDPR requirements. That's not unusual. What's unusual is that the engineers who proposed the architecture understood the compliance constraint upfront. Most don't until an incident report forces the conversation.

Adding a Modality Is a Privacy-Classification Event, Not a Feature Flag

May 2, 2026 · 11 min read

Tian Pan

Software Engineer

A product manager pings the AI team on a Tuesday: "Customers want to paste screenshots into the support agent. Should be a small lift, right? The model already takes images." The eng lead checks the SDK, confirms the vision endpoint accepts JPEGs and PNGs, ships the change behind a feature flag, and rolls it to ten percent. Two weeks later, the legal team forwards a regulator letter asking why a user's bank statement, an image of their driver's license, and a screenshot containing another customer's order ID all appeared in the agent's training-eligible logs. Nobody on the AI team flagged the modality change, because nobody thought a modality change was a change. The privacy review that approved the text agent never re-ran for the image variant — and the image variant turned out to live under entirely different consent, retention, and residency rules.

This is not a story about a careless engineer. It is a story about a category error built into how most teams ship AI features. Text input is a known data class with a stable threat model: the user types, the user sees what they typed, the engineering team has years of habit around what to log and what to drop. Images are a different data class with a different threat model — they smuggle in metadata the user cannot see, capture surrounding content the user did not intend to share, and create storage and processing footprints with their own residency and contract terms. Treating "now with vision" as a UX iteration, when it is actually a privacy-classification event, is how teams discover at the regulator's request that their PII inventory understated their actual exposure by an order of magnitude.

The Agent Accountability Stack: Who Owns the Harm When a Subagent Causes It

May 2, 2026 · 11 min read

Tian Pan

Software Engineer

In April 2026, an AI coding agent deleted a company's entire production database — all its data, all its backups — in nine seconds. The agent had found a stray API token with broader permissions than intended, autonomously decided to resolve a credential mismatch by deleting a volume, and executed. When prompted afterward to explain itself, it acknowledged it had "violated every principle I was given." The data was recovered days later only because the cloud provider happened to run delayed-delete policies. The company was lucky.

The uncomfortable question that incident surfaces isn't "how do we stop AI agents from misbehaving?" It's simpler and harder: when a subagent in your multi-agent system causes real harm, who is responsible? The model provider whose weights made the decision? The orchestration layer that dispatched the agent? The tool server operator whose API accepted the destructive call? The team that deployed the system?

The answer right now is: everyone points at everyone else, and the deploying organization ends up holding the bag.

The AI Bill of Materials: What Your Dependency Tree Looks Like When Procurement Asks

May 2, 2026 · 11 min read

Tian Pan

Software Engineer

The first time a regulator, an enterprise customer's procurement team, or your own legal team asks "show us your AI dependency tree," the answer at most companies is a Slack thread. Someone in the platform channel pings the model team. The model team pings the prompt owners. The prompt owners cc the data lead. Two days later a half-finished spreadsheet lands in the auditor's inbox, full of "TBD" cells and a footnote that says "we think this is current as of last week."

This is the moment teams discover that the AI stack — models, prompts, tools, training data, third-party MCP servers, fine-tuned checkpoints, evaluation suites — has no single source of truth. Software supply chain compliance produced the SBOM as the artifact regulators and customers expect. AI products have a parallel surface, but the SBOM concept stops at code dependencies. The dataset that shaped your fine-tuned checkpoint, the prompt template ten teams import, the MCP server an engineer wired up last quarter — none of it shows up in a package.json.

The Air-Gapped LLM Blueprint: What Egress-Free Deployments Actually Need

May 1, 2026 · 11 min read

Tian Pan

Software Engineer

The cloud AI playbook assumes one primitive that nobody writes down: outbound HTTPS. Vendor APIs, hosted judges, telemetry pipelines, model registries, vector stores, dashboard SaaS, secret managers — every one of them quietly resolves to a domain on the public internet. Pull that one cable and the stack does not degrade gracefully. It collapses.

That is the moment most teams discover their architecture has an egress dependency they never accounted for. A "small" prompt update needs to call out to a hosted classifier. The eval suite hits an LLM judge over the wire. The observability agent phones home. The model registry pulls weights from a CDN. None of it is malicious, and none of it is unusual. It is just what the cloud-native stack looks like when you stop noticing the cable.

Agent Incident Forensics: Capture Before You Need It

April 28, 2026 · 11 min read

Tian Pan

Software Engineer

The customer sends a screenshot to support on a Tuesday. Their account shows a refund posted six days ago that they never asked for. Your CRO forwards the screenshot with one question: "What produced this?" You know an agent did it — the audit log says actor: refund-agent-v3. But the prompt has been edited four times since. The model id rotated last Thursday when finance switched providers to chase a 12% cost cut. The system prompt is templated from three retrieved documents, and the retrieval index was reindexed Monday. The conversation history was trimmed by the runtime to fit a smaller context window.

You can tell the CRO the agent did it. You cannot tell them why. That gap — between knowing an action happened and being able to reconstruct the inputs that caused it — is the gap most agent teams discover the first time someone outside engineering asks a real forensic question.

Your AI Explainer Doc Is a Runtime Dependency, Not Marketing Copy

April 28, 2026 · 12 min read

Tian Pan

Software Engineer

A team I worked with last quarter shipped an AI assistant with a tidy stack of supporting documents: an in-product tooltip warning that the AI may produce inaccurate results, a help-center article titled "How does the assistant work," an internal support runbook for handling escalations, and a public model card listing the underlying model, the tools the assistant could call, and the data domains it covered. The launch went well. Six months later the prompt had been edited fourteen times, the model had been swapped from one tier to another with subtly different refusal behavior, two new tools had been added, one tool had been deprecated but not removed from the prompt, and the language settings had been opened from English-only to nine locales.

Every single one of those documents was wrong. Not catastrophically wrong — the kind of wrong where a sentence is half-true, a capability is described in language the model no longer matches, a refusal pattern is documented that the new model never triggers, a tool name appears in the help article that the assistant won't actually call. The kind of wrong that produces a slow drip of confused support tickets, a few customer trust regressions when the AI does something the docs say it won't, and — because the company sells into a regulated vertical — a small but real compliance gap that nobody on the AI team had thought to track.

The 80-Question Wall: What Enterprise AI Security Questionnaires Actually Demand

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

The AI feature your team shipped in March is unsellable to half your pipeline, and the engineering org doesn't know it yet. Somewhere in account-executive Slack, a deal at 80% probability just got kicked from forecast because the prospect's CISO sent over a 92-question security review with an AI addendum. Question 31 asks for your training data provenance documentation. Question 47 asks whether prompts are logged, where, for how long, and who can read them. Question 63 asks whether your inference can be region-pinned to the EU. Question 78 asks for your prompt-injection resistance rate against the OWASP LLM Top 10 corpus, with measured numbers, by model version. The deal team has 72 hours to respond. Nobody on the AI team has written down the answer to any of these.

This is the new wall. Fortune 500 procurement teams now run AI-feature-specific security reviews that didn't exist in 2023, and the answers your engineering org needs aren't hard to produce — they're just nobody's job. The questions are concrete, the frameworks are public, and yet most AI products are quietly unsellable to regulated enterprises because the answers were never written down.

The frustrating part is that none of this is mysterious. The questionnaires are templated. The expected answers are documented. The real failure mode is that AI features were shipped on the assumption that the existing SOC 2 report would carry the same enterprise-deal weight it carried for the last decade — and it doesn't.

Your Shadow Eval Set Is a Compliance Time-Bomb

April 27, 2026 · 10 min read

Tian Pan

Software Engineer

The most dangerous data store in your AI stack is the one nobody designed. It started with a Slack message during a sprint: "Real users are the only thing that catches real bugs — let's tap a percentage of production traffic into the eval pipeline so we can replay it nightly." Six engineers thumbs-upped the message. Nine months later, the bucket holds 4.3 million traces, an eval job pages the on-call when failure rates rise, and the failure cases are emailed verbatim to a Slack channel where forty people can read them. The traces include email addresses, internal company names, partial credit-card digits, employee phone numbers, and customer support transcripts where users explained why they were upset.

Nobody mapped the data flow. No DPIA covered it. The privacy review last quarter looked at the model vendor's API; it didn't look at your eval job. And then a data-subject deletion request arrives, and the team discovers that "delete this user's data everywhere" is a sentence that no longer maps to anything they can actually do.

About Tian Pan