Skip to main content

55 posts tagged with "compliance"

View all tags

The AI Observability Leak: Your Tracing Stack Is a Data Exfiltration Surface

· 11 min read
Tian Pan
Software Engineer

A security team I talked to recently found that their prompt and response fields were being shipped, in full, to a third-party SaaS logging backend they had never signed a Data Processing Agreement with. The fields contained customer medical summaries, Stripe secret keys accidentally pasted by support agents, and the full text of a confidential acquisition memo that someone had asked an internal assistant to summarize. Nothing was encrypted in the payload. Nothing was redacted. The retention was 400 days. The integration was set up during a hackathon by a well-meaning engineer who pip install-ed the vendor's SDK, dropped in an API key, and shipped.

This is the AI observability leak. Every LLM app team ends up wanting tracing — you cannot debug prompt regressions or non-deterministic agent loops without it — so one of LangSmith, Langfuse, Helicone, Phoenix, Braintrust, or a vendor AI add-on ends up in the stack. The default setup captures the entire request and response. That default is, for most production workloads, a compliance violation waiting to be discovered.

Your Chain-of-Thought Is a Story, Not an Audit Log

· 11 min read
Tian Pan
Software Engineer

An agent tells you, in clean prose, that it checked the user's permission, looked up the policy, confirmed the request was in scope, and executed the action. Legal reads the trace. Auditors read the trace. Your incident review reads the trace. Everyone reads the same paragraph and everyone comes away satisfied.

None of them know whether the permission check actually ran. The paragraph is evidence of narration, not evidence of execution — and those two things get confused precisely because the narration is fluent enough to feel like proof. Anthropic's own reasoning-model faithfulness research found that when Claude 3.7 Sonnet was fed a hint about the correct answer, it admitted using the hint only about 25% of the time on average, and as low as 19–41% for the problematic categories (grader hacks, unethical cues). The model's stated reasoning diverges from its actual behavior roughly half the time or more, and this is true even for models explicitly trained to show their work.

Your Fine-Tuning Corpus Is a GDPR Data Artifact, Not Just an ML Asset

· 11 min read
Tian Pan
Software Engineer

The moment your first fine-tune lands in production, your weights become a new kind of record your privacy program has never cataloged. A customer support transcript that made it into your training mix is no longer just a row in a database you can DELETE — it is now encoded, redundantly and non-extractably, into the parameters your API serves. The original record can be scrubbed from S3, erased from your warehouse, and removed from your RAG index, while the model continues to complete prompts with fragments of that customer's name, account ID, or medical history. The Data Protection Agreement your sales team signed promised you'd honor erasure requests. Nobody asked the ML team whether that was technically possible.

Research on PII extraction shows this is not hypothetical. The PII-Scope benchmark reports that adversarial extraction rates can increase up to fivefold against pretrained models under realistic query budgets, and membership inference attacks using self-prompt calibration have pushed AUC from 0.7 to 0.9 on fine-tuned models. Llama 3.2 1B, a small and widely copied base, has been demonstrated to memorize sensitive records present in its training set. The takeaway for anyone shipping fine-tunes on production traces is blunt: you cannot assume your weights forgot.

This matters because most fine-tuning pipelines were designed by ML engineers optimizing for loss, not by data stewards optimizing for Article 17. The result is an artifact whose legal status is ambiguous, whose lineage is rarely documented, and whose "delete user X" workflow doesn't exist.

AI Compliance Infrastructure for Regulated Industries: What LLM Frameworks Don't Give You

· 11 min read
Tian Pan
Software Engineer

Most teams deploying LLMs in regulated industries discover their compliance gap the hard way: the auditors show up and ask for a complete log of which documents informed which outputs on a specific date, and there is no answer to give. Not because the system wasn't logging — it was — but because text logs of LLM calls aren't the same thing as a tamper-evident audit trail, and an LLM API response body isn't the same thing as output lineage.

Finance, healthcare, and legal are not simply "stricter" versions of consumer software. They require infrastructure primitives that general-purpose LLM frameworks never designed for: immutable event chains, per-output provenance, refusal disposition records, and structured explainability hooks. None of the popular orchestration frameworks give you these out of the box. This article describes the architecture gap and how to close it without rebuilding your entire stack.

EU AI Act Compliance Is an Engineering Problem: The Audit Trail You Have to Ship

· 10 min read
Tian Pan
Software Engineer

Most engineering teams building AI systems in 2026 understand that the EU AI Act exists. Very few understand what it actually requires them to build. The regulation's core obligations for high-risk AI systems — automatic event logging, human oversight mechanisms, risk management systems, technical documentation — are not policy artifacts that a legal team can produce on a deadline. They are engineering deliverables that require architectural decisions made at the start of a project, not in the final sprint before a compliance audit.

The hard deadline is August 2, 2026. High-risk AI systems deployed in the EU must be in full compliance with Articles 9 through 15. Organizations deploying AI in employment screening, credit scoring, benefits allocation, healthcare prioritization, biometric identification, or critical infrastructure management are in scope. If your system makes decisions that materially affect people in those domains and touches EU residents, it is almost certainly high-risk. And realistic compliance implementation timelines run 8 to 14 months — which means if you haven't started, you're already late.

AI Content Provenance in Production: C2PA, Audit Trails, and the Compliance Deadline Engineers Are Missing

· 12 min read
Tian Pan
Software Engineer

When the EU AI Act's transparency obligations take effect on August 2, 2026, every system that generates synthetic content for EU-resident users will need to mark that content with machine-readable provenance. Most engineering teams building AI products are vaguely aware of this. Far fewer have actually stood up the infrastructure to comply — and of those that have, a substantial fraction have implemented only part of what regulators require.

The dominant technical response to "AI content provenance" has been to point at C2PA (the Coalition for Content Provenance and Authenticity standard) and declare the problem solved. C2PA is important. It's real, it's being adopted by Adobe, Google, OpenAI, Sony, and Samsung, and it's the closest thing to a universal standard the industry has. But a C2PA implementation alone will not satisfy EU AI Act Article 50. It won't survive your CDN. And it won't prevent bad actors from producing "trusted" provenance for manipulated content.

This post is about what AI content provenance actually requires in production — the technical stack, the failure modes, and the compliance gaps that catch teams off guard.

The AI Output Copyright Trap: What Engineers Need to Know Before It's a Legal Problem

· 11 min read
Tian Pan
Software Engineer

When a large language model reproduces copyrighted text verbatim in response to a user prompt, who is legally responsible — the model provider, your company that built the product, or the user who typed the query? In 2026, courts are actively working through exactly this question, and the answers have consequences that land squarely on your production systems.

Most engineering teams have absorbed the basic narrative: "AI training might infringe copyright, but that's the model provider's problem." That narrative is wrong in two important ways. First, output-based liability — what the model produces at inference time — is largely distinct from training-data liability and remains an open legal question in most jurisdictions. Second, the contractual indemnification you think you have from your AI provider is probably narrower than you believe.

This post covers the practical risk surface for engineering teams: what verbatim memorization rates look like in production, how open source license contamination actually shows up in generated code, where enterprise AI agreements leave you exposed, and the engineering controls that meaningfully reduce liability without stopping AI adoption.

The EU AI Act Is Now Your Engineering Backlog

· 12 min read
Tian Pan
Software Engineer

Most engineering teams discovered the GDPR through a legal email that arrived three weeks before the deadline. The EU AI Act is repeating that pattern, and the August 2, 2026 enforcement date for high-risk AI systems is close enough that "we'll deal with compliance later" is no longer an option. The difference between GDPR and the AI Act is that GDPR compliance was mostly about data handling policies. AI Act compliance requires building new system components — components that don't exist yet in most production AI systems.

What the regulation calls "human oversight obligations" and "audit trail requirements" are, translated into engineering language, a dashboard, an event log, and a data lineage system. This article treats the EU AI Act as an engineering specification rather than a legal document and walks through what you actually need to build.

The EU AI Act Features That Silently Trigger High-Risk Compliance — and What You Must Ship Before August 2026

· 9 min read
Tian Pan
Software Engineer

An appliedAI study of 106 enterprise AI systems found that 40% had unclear risk classifications. That number is not a reflection of regulatory complexity — it is a reflection of how many engineering teams shipped AI features without asking whether the feature changes their compliance tier. The EU AI Act has a hard enforcement date of August 2, 2026 for high-risk systems. At that point, being in the 40% is not a management problem. It is an architecture problem you will be fixing at four times the original cost, under deadline pressure, with regulators watching.

This article is not a legal overview. It is an engineering read on the specific product decisions that silently trigger high-risk classification, the concrete deliverables those classifications require, and why the retrofit path is so much more expensive than the build-it-in path.

The Copyright Exposure in AI-Generated Content: A Risk Framework for Engineering Teams

· 10 min read
Tian Pan
Software Engineer

GPT-4 reproduced exact passages from books in 43% of test prompts when asked to continue a given excerpt. In one 2025 study, researchers extracted nearly an entire book near-verbatim from a production LLM — no jailbreaking required, just a persistent prefix-feeding loop. If your product generates content using a language model, the copyright exposure is not a future risk. It is happening in your users' sessions today, and you probably have no instrumentation to catch it.

This is not primarily a legal article. It's an engineering article about a legal problem that engineering decisions either create or contain. Lawyers will tell you what constitutes infringement. This framework tells you where your system leaks, how to measure it, and what actually reduces risk versus what only looks like it does.

Enterprise RAG Governance: The Org Chart Behind Your Retrieval Pipeline

· 11 min read
Tian Pan
Software Engineer

Forty to sixty percent of enterprise RAG deployments fail to reach production. The culprit is almost never the retrieval algorithm—HNSW indexing works fine, embeddings are reasonably good, and vector similarity search is a solved problem. The breakdown happens upstream and downstream: no document ownership, no access controls enforced at query time, PII sitting unprotected in vector indexes, and a retrieval corpus that diverges from reality within weeks of launch. These are governance failures, and most engineering teams treat them as someone else's problem right up until a compliance team, a security audit, or a user who received another tenant's data makes it their problem.

This is the organizational and technical anatomy of a governed RAG knowledge base—written for engineers who own the pipeline, not executives who approved the budget.

Fine-Tuning Dataset Provenance: The Audit Question You Can't Answer Six Months Later

· 10 min read
Tian Pan
Software Engineer

Six months after you shipped your fine-tuned model, a regulator asks: "Which training examples came from users who have since revoked consent?" You open a spreadsheet, search a Slack archive, and find yourself reconstructing history from annotation batch emails and a README that hasn't been updated since the first sprint. This is the norm, not the exception. An audit of 44 major instruction fine-tuning datasets found over 70% of their licenses listed as "unspecified," with error rates above 50% in how license categories were actually applied. The provenance problem is structural, and it bites hardest when you can least afford it.

This post is about building a provenance registry for fine-tuning data before you need it — the schema, the audit scenarios that drive its requirements, and the production patterns that make it tractable without becoming a second job.