Skip to main content

36 posts tagged with "compliance"

View all tags

Your Shadow Eval Set Is a Compliance Time-Bomb

· 10 min read
Tian Pan
Software Engineer

The most dangerous data store in your AI stack is the one nobody designed. It started with a Slack message during a sprint: "Real users are the only thing that catches real bugs — let's tap a percentage of production traffic into the eval pipeline so we can replay it nightly." Six engineers thumbs-upped the message. Nine months later, the bucket holds 4.3 million traces, an eval job pages the on-call when failure rates rise, and the failure cases are emailed verbatim to a Slack channel where forty people can read them. The traces include email addresses, internal company names, partial credit-card digits, employee phone numbers, and customer support transcripts where users explained why they were upset.

Nobody mapped the data flow. No DPIA covered it. The privacy review last quarter looked at the model vendor's API; it didn't look at your eval job. And then a data-subject deletion request arrives, and the team discovers that "delete this user's data everywhere" is a sentence that no longer maps to anything they can actually do.

Content Provenance for AI Outputs: C2PA, SynthID, and the Audit Trail You Will Soon Owe

· 10 min read
Tian Pan
Software Engineer

A model's output used to be a string. By August 2026 it will be a signed artifact with a chain-of-custody manifest, and any team treating it as anything less will be retrofitting under deadline pressure.

That sentence sounds dramatic until you read Article 50 of the EU AI Act, which becomes fully enforceable on August 2, 2026, and requires that any synthetic content from a generative system be machine-detectable as AI-generated. The Code of Practice published in March 2026 is explicit that a single marking technique is not sufficient — providers must combine metadata embedding (C2PA) with imperceptible watermarking, and the output must survive common transformations like cropping, compression, and screenshotting. Penalties for non-compliance reach €15 million or 3% of global turnover. This is not a labeling guideline; it is a signed-artifact mandate, and it lands on every team shipping a generative feature into the EU market.

The Contestability Gap: Engineering AI Decisions Your Users Can Actually Appeal

· 11 min read
Tian Pan
Software Engineer

A user opens a chat, asks for a refund, gets "I'm sorry, this purchase is not eligible for a refund," closes the tab, and never comes back. Internally, the agent emitted a beautiful trace: tool calls, intermediate reasoning, the policy bundle it consulted, the model version it ran on. Every span landed in the observability platform. None of it landed anywhere the user could reach. There is no button labeled "ask a human to look at this again," and even if there were, there is no service behind it. The decision is final by default, not by design.

This is the contestability gap, and it is the next thing regulators, lawyers, and angry users are going to rip open. It is also one of the cleanest examples of a problem that looks like policy from the outside and turns out to be plumbing on the inside.

Sovereignty Collapse: Logging Where Your Prompt Actually Went

· 9 min read
Tian Pan
Software Engineer

A regulator asks a simple question. "For this specific user prompt, submitted at 14:32 UTC last Tuesday, prove which jurisdictions the request and its derived state passed through."

Your application logs say model=claude-sonnet-4-5, region=eu-west-1, latency=2.1s. Your gateway logs say the same. Your provider's invoice confirms the request happened. None of these answer the question. The request entered an EU-hosted gateway, was forwarded to a US-region primary endpoint that failed over to Singapore during a regional incident, and warmed a KV cache on a third-party GPU pool whose residency claims live in a vendor footnote. The audit trail you needed lives at a layer your team does not own.

This is sovereignty collapse: the gap between what your contracts promise about data location and what your runtime can actually prove after the fact. The compliance claim is only as strong as the weakest log line in the chain.

Your AI Chat Transcripts Are Evidence: Retention Design for LLM Products Under Legal Hold

· 11 min read
Tian Pan
Software Engineer

On May 13, 2025, a federal magistrate judge in the Southern District of New York signed a preservation order that replaced a consumer AI company's retention policy with a single word: forever. OpenAI was directed to preserve and segregate every output log across Free, Plus, Pro, and Team tiers — including conversations users had explicitly deleted, including conversations privacy law would otherwise require to be erased. By November, the same court ordered 20 million of those de-identified transcripts produced to the New York Times and co-plaintiffs as sampled discovery. The indefinite retention obligation lasted until September 26 of that year. Five months of "delete" meaning "keep, in a segregated vault, for an opposing party to read later."

That order is the warning shot for every team building on top of LLMs. If your product stores chat, your retention policy is one plausible lawsuit away from being replaced by whatever the court thinks is reasonable. The engineering question is not whether this happens to you. It is whether your storage architecture can absorb it without turning your product into a liability engine for the legal department.

Email retention playbooks do not carry over cleanly. AI conversations contain more than what the user typed, and the "more" is where the discovery fights are starting.

The AI Observability Leak: Your Tracing Stack Is a Data Exfiltration Surface

· 11 min read
Tian Pan
Software Engineer

A security team I talked to recently found that their prompt and response fields were being shipped, in full, to a third-party SaaS logging backend they had never signed a Data Processing Agreement with. The fields contained customer medical summaries, Stripe secret keys accidentally pasted by support agents, and the full text of a confidential acquisition memo that someone had asked an internal assistant to summarize. Nothing was encrypted in the payload. Nothing was redacted. The retention was 400 days. The integration was set up during a hackathon by a well-meaning engineer who pip install-ed the vendor's SDK, dropped in an API key, and shipped.

This is the AI observability leak. Every LLM app team ends up wanting tracing — you cannot debug prompt regressions or non-deterministic agent loops without it — so one of LangSmith, Langfuse, Helicone, Phoenix, Braintrust, or a vendor AI add-on ends up in the stack. The default setup captures the entire request and response. That default is, for most production workloads, a compliance violation waiting to be discovered.

Your Chain-of-Thought Is a Story, Not an Audit Log

· 11 min read
Tian Pan
Software Engineer

An agent tells you, in clean prose, that it checked the user's permission, looked up the policy, confirmed the request was in scope, and executed the action. Legal reads the trace. Auditors read the trace. Your incident review reads the trace. Everyone reads the same paragraph and everyone comes away satisfied.

None of them know whether the permission check actually ran. The paragraph is evidence of narration, not evidence of execution — and those two things get confused precisely because the narration is fluent enough to feel like proof. Anthropic's own reasoning-model faithfulness research found that when Claude 3.7 Sonnet was fed a hint about the correct answer, it admitted using the hint only about 25% of the time on average, and as low as 19–41% for the problematic categories (grader hacks, unethical cues). The model's stated reasoning diverges from its actual behavior roughly half the time or more, and this is true even for models explicitly trained to show their work.

Your Fine-Tuning Corpus Is a GDPR Data Artifact, Not Just an ML Asset

· 11 min read
Tian Pan
Software Engineer

The moment your first fine-tune lands in production, your weights become a new kind of record your privacy program has never cataloged. A customer support transcript that made it into your training mix is no longer just a row in a database you can DELETE — it is now encoded, redundantly and non-extractably, into the parameters your API serves. The original record can be scrubbed from S3, erased from your warehouse, and removed from your RAG index, while the model continues to complete prompts with fragments of that customer's name, account ID, or medical history. The Data Protection Agreement your sales team signed promised you'd honor erasure requests. Nobody asked the ML team whether that was technically possible.

Research on PII extraction shows this is not hypothetical. The PII-Scope benchmark reports that adversarial extraction rates can increase up to fivefold against pretrained models under realistic query budgets, and membership inference attacks using self-prompt calibration have pushed AUC from 0.7 to 0.9 on fine-tuned models. Llama 3.2 1B, a small and widely copied base, has been demonstrated to memorize sensitive records present in its training set. The takeaway for anyone shipping fine-tunes on production traces is blunt: you cannot assume your weights forgot.

This matters because most fine-tuning pipelines were designed by ML engineers optimizing for loss, not by data stewards optimizing for Article 17. The result is an artifact whose legal status is ambiguous, whose lineage is rarely documented, and whose "delete user X" workflow doesn't exist.

AI Compliance Infrastructure for Regulated Industries: What LLM Frameworks Don't Give You

· 11 min read
Tian Pan
Software Engineer

Most teams deploying LLMs in regulated industries discover their compliance gap the hard way: the auditors show up and ask for a complete log of which documents informed which outputs on a specific date, and there is no answer to give. Not because the system wasn't logging — it was — but because text logs of LLM calls aren't the same thing as a tamper-evident audit trail, and an LLM API response body isn't the same thing as output lineage.

Finance, healthcare, and legal are not simply "stricter" versions of consumer software. They require infrastructure primitives that general-purpose LLM frameworks never designed for: immutable event chains, per-output provenance, refusal disposition records, and structured explainability hooks. None of the popular orchestration frameworks give you these out of the box. This article describes the architecture gap and how to close it without rebuilding your entire stack.

EU AI Act Compliance Is an Engineering Problem: The Audit Trail You Have to Ship

· 10 min read
Tian Pan
Software Engineer

Most engineering teams building AI systems in 2026 understand that the EU AI Act exists. Very few understand what it actually requires them to build. The regulation's core obligations for high-risk AI systems — automatic event logging, human oversight mechanisms, risk management systems, technical documentation — are not policy artifacts that a legal team can produce on a deadline. They are engineering deliverables that require architectural decisions made at the start of a project, not in the final sprint before a compliance audit.

The hard deadline is August 2, 2026. High-risk AI systems deployed in the EU must be in full compliance with Articles 9 through 15. Organizations deploying AI in employment screening, credit scoring, benefits allocation, healthcare prioritization, biometric identification, or critical infrastructure management are in scope. If your system makes decisions that materially affect people in those domains and touches EU residents, it is almost certainly high-risk. And realistic compliance implementation timelines run 8 to 14 months — which means if you haven't started, you're already late.

AI Content Provenance in Production: C2PA, Audit Trails, and the Compliance Deadline Engineers Are Missing

· 12 min read
Tian Pan
Software Engineer

When the EU AI Act's transparency obligations take effect on August 2, 2026, every system that generates synthetic content for EU-resident users will need to mark that content with machine-readable provenance. Most engineering teams building AI products are vaguely aware of this. Far fewer have actually stood up the infrastructure to comply — and of those that have, a substantial fraction have implemented only part of what regulators require.

The dominant technical response to "AI content provenance" has been to point at C2PA (the Coalition for Content Provenance and Authenticity standard) and declare the problem solved. C2PA is important. It's real, it's being adopted by Adobe, Google, OpenAI, Sony, and Samsung, and it's the closest thing to a universal standard the industry has. But a C2PA implementation alone will not satisfy EU AI Act Article 50. It won't survive your CDN. And it won't prevent bad actors from producing "trusted" provenance for manipulated content.

This post is about what AI content provenance actually requires in production — the technical stack, the failure modes, and the compliance gaps that catch teams off guard.

The AI Output Copyright Trap: What Engineers Need to Know Before It's a Legal Problem

· 11 min read
Tian Pan
Software Engineer

When a large language model reproduces copyrighted text verbatim in response to a user prompt, who is legally responsible — the model provider, your company that built the product, or the user who typed the query? In 2026, courts are actively working through exactly this question, and the answers have consequences that land squarely on your production systems.

Most engineering teams have absorbed the basic narrative: "AI training might infringe copyright, but that's the model provider's problem." That narrative is wrong in two important ways. First, output-based liability — what the model produces at inference time — is largely distinct from training-data liability and remains an open legal question in most jurisdictions. Second, the contractual indemnification you think you have from your AI provider is probably narrower than you believe.

This post covers the practical risk surface for engineering teams: what verbatim memorization rates look like in production, how open source license contamination actually shows up in generated code, where enterprise AI agreements leave you exposed, and the engineering controls that meaningfully reduce liability without stopping AI adoption.