Skip to main content

55 posts tagged with "compliance"

View all tags

Agent Memory Is a Compliance Surface: The Records-Management System You Didn't Sign Up to Build

· 12 min read
Tian Pan
Software Engineer

The first compliance escalation against your agent memory layer almost never arrives as a regulator's letter. It arrives as a Jira ticket from your enterprise sales engineer that says "the customer's privacy team is blocking the contract — they want to know what 'forget my user' actually means in your system, and they want a written answer by Friday." That ticket lands six to twelve months after the memory layer shipped, and the engineering team that built it discovers, in the time it takes to read the question, that they accidentally built a records-management system without any of the primitives a records-management system is supposed to have.

This is the structural problem with long-term memory in agentic products. The team building it optimizes for the things memory is sold to do — retrieval quality, latency, storage cost, the felt-personalization that makes the assistant feel like it knows the user. Nobody in the design review prices the parallel system being built at the same time: a per-user, per-tenant, multi-region data store with retention obligations, deletion semantics, audit export requirements, and a regulator's clock that starts the moment the first user's data lands in it. Memory is not a feature. It is the operational surface that every privacy regime, every enterprise procurement questionnaire, and every right-to-erasure request will eventually find.

The Agent Scratch Directory: The Unowned Filesystem PII Surface Nobody Inventoried

· 10 min read
Tian Pan
Software Engineer

A regulator walks into your office and asks the question security teams rehearse for: "Show me every place customer data lives." Your data team produces the inventory. The primary database is on it. The analytics warehouse is on it. The object store, the queue, the search index, the backup destination — all on it, with classification labels, retention policies, encryption details, and named owners. Then someone in the room mentions the agent worker pool, and the inventory has nothing to say. The pool has been running for nine months. Each worker has a local disk. The agents on those workers have been parsing PDFs, transcribing audio, downloading email attachments, and caching intermediate JSON between tool calls the entire time. Nobody put any of that on the asset register.

This is the scratch directory problem. Every long-running agent worker accumulates an ephemeral filesystem that grows organically as new tools are added — extracted text from a PDF parser, transcribed audio from a Whisper step, downloaded attachments from a Gmail tool, screenshots from a browser-use step, vector-search snippets cached for the next turn, intermediate JSON the agent emitted between two tool calls so the second one wouldn't have to re-derive it. Unlike databases and queues and buckets, this surface has no retention policy, no encryption-at-rest standard, no DLP scanner pass, and no entry on the data-classification spreadsheet. The platform team thinks "agent state" means the inference-provider context window. The SRE team thinks "agent state" means the durable database. The worker's /tmp/agent-workspace-${session_id}/ directory is a third copy of customer data that nobody owns.

Right-to-Erasure Meets Fine-Tuning: When Deletion Stops at the Snapshot

· 11 min read
Tian Pan
Software Engineer

A customer files a subject-access request asking for their data to be deleted. The data engineer purges the production database, the analytics warehouse, the support ticket archive, the cold-storage backups. Every system the legal team listed in the data inventory comes back clean. Then somebody in the room asks the question that nobody wants to answer first: what about the model?

Three months ago that customer's support transcripts went into a fine-tuning run. The resulting adapter has been serving predictions to other customers ever since, with their phrasing, their account names, occasionally their literal sentences embedded in the weights. You can prove deletion in the warehouse. You cannot prove deletion in the model — and the more honest member of the team is the one who says so out loud.

The Audit Trail Mismatch: When User, Agent, and Tool Each Have Different Logs

· 10 min read
Tian Pan
Software Engineer

A regulator emails you a single question: did this user authorize this transaction? Six hours later, three engineers are in a chat trying to join the chat surface's conversation log to the planner agent's reasoning trace to the tool's API record. The chat log has a turn ID and the user-visible message but no tool call detail. The planner trace has a tool-invocation record with timestamps that drift from the chat log by several hundred milliseconds. The tool's log has the API call with its own correlation ID that appears nowhere in the agent's record. The downstream service's log has yet another ID with no link back. The team eventually reconstructs the answer by joining on user IDs and approximate timestamps, hopes nothing critical is off by a turn, and ships a PDF to legal.

This is the audit trail mismatch. Every layer's owner believes their logs are fine — and individually, they are. The joined view is the artifact that doesn't exist, and nobody owns its absence. The team only finds out it doesn't exist when an incident, a customer escalation, or a regulator forces the join.

Compliance Reviewer as Eval Author: Why Legal Should Be Writing Your Test Cases

· 13 min read
Tian Pan
Software Engineer

The most useful adversarial prompt I have seen for an enterprise LLM did not come from a red team, a security researcher, or a prompt engineer. It came from a senior compliance attorney who asked the model, in plain English, to "tell me which of the three retirement annuities discussed earlier in this thread is the best one for a 62-year-old approaching their first required minimum distribution." The model produced a confident, thoughtful, beautifully-formatted recommendation. That output, had it been sent to a customer, would have been a textbook FINRA suitability violation — an unsuitable individualized recommendation made without the supervisory infrastructure that securities rules require around personalized advice.

The compliance attorney spotted the failure mode in about four seconds. The engineering eval suite, which had a hundred-plus carefully constructed cases for hallucination, refusal calibration, and tool-use accuracy, had no concept that this particular response shape was illegal. Not low quality. Not a hallucination. Illegal. And the workflow at the company at the time had her reading sample outputs in a Google Doc and writing memos, rather than checking a test case into the regression suite. So her catch lived in a memo, the memo got summarized in a launch-readiness slide, and the next month a refactor of the system prompt regressed the behavior because nobody had a failing test pinned to it.

That is the gap I want to argue we should close: the compliance reviewer should be authoring eval cases directly, and those cases should be the artifact that gates release — not the document review that produced them.

Training Your AI on Production Data Without Triggering a Legal Blocker

· 11 min read
Tian Pan
Software Engineer

Your AI feature launched. Users are engaging with it. The gap between what it does and what it should do is visible in every session replay, every thumbs-down, every request that returns a wrong answer. You have the signal. The question is whether you can legally act on it.

This is where teams hit the compliance wall. Not a theoretical wall — a concrete one. In 2024 alone, European regulators issued over €1.2 billion in GDPR fines, with OpenAI, Meta, and LinkedIn among the named defendants. The common thread across most enforcement actions: using behavioral data in ways that weren't explicitly scoped at collection time, or collecting more than was necessary to operate the feature. The fact that your intent is model improvement rather than advertising doesn't move regulators the way engineers assume it does.

HIPAA, SOC2, and Your Agent: The Architectural Constraints Compliance Actually Imposes

· 12 min read
Tian Pan
Software Engineer

The typical AI team's encounter with compliance goes like this: the agent is in production, users love it, and someone from legal forwards an email asking whether the system is HIPAA-compliant. The engineer assigned to answer discovers that context windows contain PHI, that there are no audit logs with sufficient granularity, that the LLM provider doesn't have a signed Business Associate Agreement, and that the agent's tool permissions are broader than the minimum necessary standard allows. The fix takes three months and requires a partial rewrite.

This pattern is not an edge case. According to a 2024 industry survey, 78% of business executives cannot pass an AI governance audit within 90 days, and 42% of companies abandoned AI initiatives in 2025 primarily due to compliance and governance failures — not technical ones. The gap between what gets built and what compliance actually requires is architectural, and it forms in sprint one.

The Compliance Attestation Gap Nobody Talks About in AI-Assisted Development

· 9 min read
Tian Pan
Software Engineer

Your engineers are shipping AI-generated code every day. Your auditors are reviewing change management controls designed for a world where every line of code was written by the person who approved it. Both facts are true simultaneously, and if you're in a regulated industry, that gap is a liability you probably haven't fully priced.

The compliance certification problem with AI-generated code is not a vendor problem — your AI coding tool's SOC 2 report doesn't cover your change management controls. It's a process attestation problem: the fundamental assumption underneath SOC 2 CC8.1, HIPAA security rule change controls, and PCI-DSS Section 6 is that the person who approved the code change understood it. That assumption no longer holds.

Why Your Application Logs Can't Reconstruct an AI Decision

· 11 min read
Tian Pan
Software Engineer

An AI system flags a job application as low-priority. The candidate appeals. Legal asks engineering: "Show us exactly what the model saw, which documents it retrieved, which policy rules fired, and what confidence score it produced." Engineering opens the logs and finds: a timestamp, an HTTP 200, a response body, and a latency metric. The rest is gone.

This is not a logging failure. The logs are complete by every traditional measure. The problem is that application logs were never designed to record reasoning — and AI systems don't just execute code, they make context-dependent probabilistic decisions that can only be understood given the full input context that existed at decision time.

Data-Sensitivity-Tier Model Routing: Governing Which Model Sees Which Data

· 11 min read
Tian Pan
Software Engineer

Your AI system routed a patient query to a self-hosted model at 9 AM. At 11 AM, that model's pod restarted during a deployment. The request queue backed up, the router detected a timeout, and it fell back to the cloud LLM you use for generic queries. The query completed successfully. No alerts fired. Your monitoring dashboard showed green. Somewhere in that exchange, protected health information traveled to a vendor with whom you have no Business Associate Agreement.

That's not a hypothetical. It's the default behavior of nearly every AI routing stack that wasn't explicitly designed to prevent it.

The Stakeholder Explanation Layer: Building AI Transparency That Regulators and Executives Actually Accept

· 12 min read
Tian Pan
Software Engineer

When legal asks "why did the AI deny this loan application?", your chain-of-thought trace is the wrong answer. It doesn't matter that you have 1,200 tokens of step-by-step reasoning. What they need is a sentence that holds up in a deposition — and right now, most engineering teams have no idea how to produce it.

This is the stakeholder explanation gap: the distance between what engineers understand about model behavior and what regulators, executives, and legal teams need to do their jobs. Closing it requires a distinct architectural layer — one that most production AI systems never build.

Multi-Region AI Deployment: Data Residency, Model Parity, and the Latency Tax Nobody Budgets

· 10 min read
Tian Pan
Software Engineer

When engineers budget for multi-region AI deployments, they typically account for two variables: infrastructure cost per region and replication overhead. What they consistently underestimate — sometimes catastrophically — are three costs that only appear once you're live: model parity gaps that make your EU cluster produce different outputs than your US cluster, KV cache isolation penalties that make every token in GDPR territory more expensive to generate, and silent compliance violations that trigger when your retry logic routes a French user's data through Virginia.

A German bank spent 14 months deploying a large open-source model on-premises to satisfy GDPR requirements. That's not unusual. What's unusual is that the engineers who proposed the architecture understood the compliance constraint upfront. Most don't until an incident report forces the conversation.