Skip to main content

47 posts tagged with "compliance"

View all tags

The AI Accessibility Audit Nobody Runs

· 11 min read
Tian Pan
Software Engineer

Open your agent product, turn on VoiceOver, and hit send on any prompt. If you have a typical streaming UI with an inline reasoning trace, what you will hear in the next thirty seconds is not your product. It is a torrent of partial tokens, mid-word reflows, status changes nobody announced, and a reasoning monologue your sighted users opted into but your blind users cannot escape. The interface that demoed beautifully on stage is, to a screen reader, a denial-of-service attack delivered as speech.

This is the audit nobody on the AI team runs. The design review approved the streaming animation. The eval suite measured answer quality. The latency dashboard tracked time-to-first-token. None of those instruments noticed that the affordance making the product feel fast and thoughtful for one cohort makes it unusable for another. And that omission is starting to show up in pro-se lawsuit filings — the same federal courts that have been processing accessibility complaints against e-commerce sites for a decade are now seeing AI-interface complaints rise sharply, with one tracker reporting a 40% year-over-year increase in 2025 alone.

The Internal Eval Set Is a Privacy Boundary Nobody Reviewed

· 11 min read
Tian Pan
Software Engineer

The dataset your AI team calls "the eval set" is, in most companies shipping LLM features, a collection of real customer conversations pulled from production logs. Nobody on the team thinks of it as a privacy event. The data never left the cluster. No new system was provisioned. No vendor was added. An engineer wrote a query, exported a few thousand traces into a labeling tool, and the team started grading model outputs against them. The legal team never heard about it because, from the inside, nothing changed — the same conversations that already lived in the same database were now also being read by a few engineers and a judge model.

That is the privacy boundary nobody reviewed. Customers gave you their messages so you could answer them. They did not give you their messages so you could measure your model against them. The two uses look identical at the storage layer and feel identical at the inference layer, but they are different processing purposes under every modern privacy regime — and the gap between the two is where the next round of compliance pain is going to land.

Mobile App Store Review Meets AI Features: The Deploy Cadence Collision

· 9 min read
Tian Pan
Software Engineer

A prompt regression lands in production at 9 AM. On the web app, an engineer rolls back the system prompt by lunch and the trace logs go quiet. On iOS, the same regression sits in the binary the App Store reviewed three weeks ago — and the team now has to choose between a server-side prompt swap that voids the store's review of the actual user-facing behavior, or an expedited review that costs 24-48 hours plus a soft favor with the platform team. Neither option is on the runbook.

This is the deploy cadence collision: web AI features iterate on the team's clock, mobile AI features iterate on the platform's clock, and most release trains were laid down before anyone thought to ask whether the prompt belongs on the same train as the binary. The result is a quietly accumulating tax — review delays, asymmetric rollback latency, undisclosed AI surfaces that fail privacy review on resubmit, and an entire class of AI bugs that mobile engineers fix at one-tenth the speed their web colleagues do.

The Retrieval Citation Tax: Why Compliance Adds 30% to Your RAG Token Bill

· 10 min read
Tian Pan
Software Engineer

A team I talked to recently sold their legal-AI product into a Fortune 500 in-house counsel office and added one line to their system prompt: "every factual claim must include an inline citation to the retrieved source." The product roadmap allocated a 5% buffer on their token budget for the new behavior. Sixty days after the regulated tenant went live, finance flagged a 34% jump in monthly inference spend. Nobody had broken the product. Nobody had shipped new features. The compliance requirement that closed the deal also quietly rewrote the unit economics underneath it.

This is the retrieval citation tax, and almost every RAG system serving a regulated industry — legal, healthcare, finance, audit-bound enterprise — eventually pays it. The tax is structural, not a bug. It comes from the way citation discipline forces the model into a different generation regime, and it shows up nowhere on the procurement spec the customer signed.

Agent Memory Is a Compliance Surface: The Records-Management System You Didn't Sign Up to Build

· 12 min read
Tian Pan
Software Engineer

The first compliance escalation against your agent memory layer almost never arrives as a regulator's letter. It arrives as a Jira ticket from your enterprise sales engineer that says "the customer's privacy team is blocking the contract — they want to know what 'forget my user' actually means in your system, and they want a written answer by Friday." That ticket lands six to twelve months after the memory layer shipped, and the engineering team that built it discovers, in the time it takes to read the question, that they accidentally built a records-management system without any of the primitives a records-management system is supposed to have.

This is the structural problem with long-term memory in agentic products. The team building it optimizes for the things memory is sold to do — retrieval quality, latency, storage cost, the felt-personalization that makes the assistant feel like it knows the user. Nobody in the design review prices the parallel system being built at the same time: a per-user, per-tenant, multi-region data store with retention obligations, deletion semantics, audit export requirements, and a regulator's clock that starts the moment the first user's data lands in it. Memory is not a feature. It is the operational surface that every privacy regime, every enterprise procurement questionnaire, and every right-to-erasure request will eventually find.

The Agent Scratch Directory: The Unowned Filesystem PII Surface Nobody Inventoried

· 10 min read
Tian Pan
Software Engineer

A regulator walks into your office and asks the question security teams rehearse for: "Show me every place customer data lives." Your data team produces the inventory. The primary database is on it. The analytics warehouse is on it. The object store, the queue, the search index, the backup destination — all on it, with classification labels, retention policies, encryption details, and named owners. Then someone in the room mentions the agent worker pool, and the inventory has nothing to say. The pool has been running for nine months. Each worker has a local disk. The agents on those workers have been parsing PDFs, transcribing audio, downloading email attachments, and caching intermediate JSON between tool calls the entire time. Nobody put any of that on the asset register.

This is the scratch directory problem. Every long-running agent worker accumulates an ephemeral filesystem that grows organically as new tools are added — extracted text from a PDF parser, transcribed audio from a Whisper step, downloaded attachments from a Gmail tool, screenshots from a browser-use step, vector-search snippets cached for the next turn, intermediate JSON the agent emitted between two tool calls so the second one wouldn't have to re-derive it. Unlike databases and queues and buckets, this surface has no retention policy, no encryption-at-rest standard, no DLP scanner pass, and no entry on the data-classification spreadsheet. The platform team thinks "agent state" means the inference-provider context window. The SRE team thinks "agent state" means the durable database. The worker's /tmp/agent-workspace-${session_id}/ directory is a third copy of customer data that nobody owns.

Right-to-Erasure Meets Fine-Tuning: When Deletion Stops at the Snapshot

· 11 min read
Tian Pan
Software Engineer

A customer files a subject-access request asking for their data to be deleted. The data engineer purges the production database, the analytics warehouse, the support ticket archive, the cold-storage backups. Every system the legal team listed in the data inventory comes back clean. Then somebody in the room asks the question that nobody wants to answer first: what about the model?

Three months ago that customer's support transcripts went into a fine-tuning run. The resulting adapter has been serving predictions to other customers ever since, with their phrasing, their account names, occasionally their literal sentences embedded in the weights. You can prove deletion in the warehouse. You cannot prove deletion in the model — and the more honest member of the team is the one who says so out loud.

The Audit Trail Mismatch: When User, Agent, and Tool Each Have Different Logs

· 10 min read
Tian Pan
Software Engineer

A regulator emails you a single question: did this user authorize this transaction? Six hours later, three engineers are in a chat trying to join the chat surface's conversation log to the planner agent's reasoning trace to the tool's API record. The chat log has a turn ID and the user-visible message but no tool call detail. The planner trace has a tool-invocation record with timestamps that drift from the chat log by several hundred milliseconds. The tool's log has the API call with its own correlation ID that appears nowhere in the agent's record. The downstream service's log has yet another ID with no link back. The team eventually reconstructs the answer by joining on user IDs and approximate timestamps, hopes nothing critical is off by a turn, and ships a PDF to legal.

This is the audit trail mismatch. Every layer's owner believes their logs are fine — and individually, they are. The joined view is the artifact that doesn't exist, and nobody owns its absence. The team only finds out it doesn't exist when an incident, a customer escalation, or a regulator forces the join.

Compliance Reviewer as Eval Author: Why Legal Should Be Writing Your Test Cases

· 13 min read
Tian Pan
Software Engineer

The most useful adversarial prompt I have seen for an enterprise LLM did not come from a red team, a security researcher, or a prompt engineer. It came from a senior compliance attorney who asked the model, in plain English, to "tell me which of the three retirement annuities discussed earlier in this thread is the best one for a 62-year-old approaching their first required minimum distribution." The model produced a confident, thoughtful, beautifully-formatted recommendation. That output, had it been sent to a customer, would have been a textbook FINRA suitability violation — an unsuitable individualized recommendation made without the supervisory infrastructure that securities rules require around personalized advice.

The compliance attorney spotted the failure mode in about four seconds. The engineering eval suite, which had a hundred-plus carefully constructed cases for hallucination, refusal calibration, and tool-use accuracy, had no concept that this particular response shape was illegal. Not low quality. Not a hallucination. Illegal. And the workflow at the company at the time had her reading sample outputs in a Google Doc and writing memos, rather than checking a test case into the regression suite. So her catch lived in a memo, the memo got summarized in a launch-readiness slide, and the next month a refactor of the system prompt regressed the behavior because nobody had a failing test pinned to it.

That is the gap I want to argue we should close: the compliance reviewer should be authoring eval cases directly, and those cases should be the artifact that gates release — not the document review that produced them.

Training Your AI on Production Data Without Triggering a Legal Blocker

· 11 min read
Tian Pan
Software Engineer

Your AI feature launched. Users are engaging with it. The gap between what it does and what it should do is visible in every session replay, every thumbs-down, every request that returns a wrong answer. You have the signal. The question is whether you can legally act on it.

This is where teams hit the compliance wall. Not a theoretical wall — a concrete one. In 2024 alone, European regulators issued over €1.2 billion in GDPR fines, with OpenAI, Meta, and LinkedIn among the named defendants. The common thread across most enforcement actions: using behavioral data in ways that weren't explicitly scoped at collection time, or collecting more than was necessary to operate the feature. The fact that your intent is model improvement rather than advertising doesn't move regulators the way engineers assume it does.

HIPAA, SOC2, and Your Agent: The Architectural Constraints Compliance Actually Imposes

· 12 min read
Tian Pan
Software Engineer

The typical AI team's encounter with compliance goes like this: the agent is in production, users love it, and someone from legal forwards an email asking whether the system is HIPAA-compliant. The engineer assigned to answer discovers that context windows contain PHI, that there are no audit logs with sufficient granularity, that the LLM provider doesn't have a signed Business Associate Agreement, and that the agent's tool permissions are broader than the minimum necessary standard allows. The fix takes three months and requires a partial rewrite.

This pattern is not an edge case. According to a 2024 industry survey, 78% of business executives cannot pass an AI governance audit within 90 days, and 42% of companies abandoned AI initiatives in 2025 primarily due to compliance and governance failures — not technical ones. The gap between what gets built and what compliance actually requires is architectural, and it forms in sprint one.

The Compliance Attestation Gap Nobody Talks About in AI-Assisted Development

· 9 min read
Tian Pan
Software Engineer

Your engineers are shipping AI-generated code every day. Your auditors are reviewing change management controls designed for a world where every line of code was written by the person who approved it. Both facts are true simultaneously, and if you're in a regulated industry, that gap is a liability you probably haven't fully priced.

The compliance certification problem with AI-generated code is not a vendor problem — your AI coding tool's SOC 2 report doesn't cover your change management controls. It's a process attestation problem: the fundamental assumption underneath SOC 2 CC8.1, HIPAA security rule change controls, and PCI-DSS Section 6 is that the person who approved the code change understood it. That assumption no longer holds.