HIPAA, SOC2, and Your Agent: The Architectural Constraints Compliance Actually Imposes

May 7, 2026 · 12 min read

Software Engineer

The typical AI team's encounter with compliance goes like this: the agent is in production, users love it, and someone from legal forwards an email asking whether the system is HIPAA-compliant. The engineer assigned to answer discovers that context windows contain PHI, that there are no audit logs with sufficient granularity, that the LLM provider doesn't have a signed Business Associate Agreement, and that the agent's tool permissions are broader than the minimum necessary standard allows. The fix takes three months and requires a partial rewrite.

This pattern is not an edge case. According to a 2024 industry survey, 78% of business executives cannot pass an AI governance audit within 90 days, and 42% of companies abandoned AI initiatives in 2025 primarily due to compliance and governance failures — not technical ones. The gap between what gets built and what compliance actually requires is architectural, and it forms in sprint one.

What Compliance Actually Looks At

HIPAA's Security Rule and SOC2 Trust Services Criteria both apply to AI systems that access, process, or transmit Protected Health Information — including transiently in a context window. "The model never stored the data" is not a defense. If PHI was in the context during inference, the system is in scope.

The January 2025 HIPAA Security Rule NPRM — the first major update in 20 years, proposed for finalization in mid-2026 — makes this explicit. AI software that creates, receives, maintains, or transmits ePHI must be inventoried as a technology asset, included in the organization's annual risk analysis, and subject to mandatory compliance audits. The proposed rule also eliminates the distinction between "required" and "addressable" safeguards, converting most previously optional controls into requirements.

SOC2 Type II audits evaluate operating effectiveness over a 6-to-12-month observation window, not just point-in-time. This means controls need to have been in place and functioning throughout the audit period — you cannot retroactively instrument an agent to produce evidence. If your logs don't exist from month one, no amount of retrofitting satisfies a Type II reviewer.

The Six Mistakes Teams Make Before the First Audit

Context windows retain PHI past task lifetime

An agent processes a patient record during inference. The reasoning trace — which includes exact field values, medication lists, diagnosis codes — persists in a vector store with no TTL. When the next user submits a request, the agent may pull that trace as relevant context.

HIPAA's minimum necessary standard (45 CFR 164.502(b)) requires that access to PHI be limited to what's necessary for each specific use. An agent optimized to pull broad context for better responses is structurally in tension with this. The minimum necessary standard applies at the operation level: finishing a task does not authorize continued retention of the data used to complete it.

Tool permissions are system-level, not operation-level

An agent is provisioned with "read access to the patient database." In practice, this means it can read any patient record in the database. The permission decision was made at the integration layer, not the operation layer.

A 2024 OCR enforcement action resulted in a $1.19 million settlement against a covered entity for exactly this pattern on a non-AI system: an account provisioned to read a folder that automatically inherited permissions to download files, trigger bulk actions, and move records. When the same design appears in an AI agent — where the model can request any record it finds relevant — the violation scope grows with every inference.

No audit trail for agent decisions

A billing agent denies an insurance claim based on an automated policy evaluation. There is no log of which records the agent accessed, which policy version applied, what the decision inputs were, or who authorized the agent to make that call. "The agent ran at 2:30 PM" is in the application log. Everything else is missing.

HIPAA's Audit Controls standard (45 CFR 164.312(b)) requires examining all activity on systems containing PHI at the operation level. SOC2's Processing Integrity criterion requires evidence that AI outputs are correct and controlled. Auditors will request a log showing all decisions made by AI agents affecting customers. If it doesn't exist, that finding is critical — not informational.

Ephemeral and durable memory use the same storage

A conversational agent's working memory — current task state, retrieved context snippets, intermediate reasoning — gets written to the same persistent database as long-term user preferences. Everything from every session accumulates indefinitely.

When a patient submits a deletion request under HIPAA's right of access provisions, or when a SOC2 auditor asks how sensitive data is retained and deleted, there is no clean answer. Ephemeral data was never supposed to persist, but there is no technical separation to enforce that intention.

Separation of duties violations in access management

An agent with administrative scope grants a new assistant access to a data store after analyzing the assistant's function requirements. No human approved the access change. The agent made an access management decision autonomously.

Separation of duties is a core SOC2 control under the Security criterion. Autonomous access management by AI agents — regardless of how well-reasoned the decision — is an automatic audit finding. The question auditors ask is not "was the decision correct?" but "who authorized it?"

No out-of-scope query detection

A patient-facing agent is authorized to handle billing questions. A user asks it to retrieve full psychiatric history for a research purpose. The agent complies, because the underlying model has the database credentials and a broad system prompt.

Under HIPAA, each use of PHI requires independent justification tied to the covered entity's permitted purpose. "The user asked and the agent could do it" is not a permitted purpose. Agents must detect when requests exceed their authorization scope and escalate rather than proceed.

Architectural Patterns That Satisfy an Actual Audit

Separate ephemeral and durable memory explicitly

The fundamental compliance pattern is a strict boundary between working memory and persistent storage. Ephemeral memory holds the current task's context: retrieved documents, intermediate reasoning, tool call results. It is never written to permanent storage; it expires when the task ends.

Durable memory holds facts that are explicitly promoted with governance: user preferences, long-term state, historical decisions. Promotion is an operation, not an automatic byproduct of inference.

Frameworks like LangChain and Semantic Kernel both support multiple memory backends. Using one backend for working context and a separate, governed backend for durable facts — with TTLs and deletion workflows — satisfies HIPAA's minimum necessary standard because PHI that was needed for one task does not persist to be accessed during the next.

Operation-level tool permissions with metadata-driven dispatch

Instead of giving an agent a broad set of tools and trusting the model to use them appropriately, define tool permissions in a registry with explicit scope, read-only flags, and approval requirements:

read_patient_summary:
  scope: current_patient
  fields: [name, dob, current_medications]
  readonly: true
  requires_approval: false

download_full_chart:
  scope: current_patient
  fields: ["*"]
  readonly: true
  requires_approval: true
  approval_timeout: 300

schedule_appointment:
  scope: current_patient
  fields: [calendar, appointment_history]
  readonly: false
  requires_approval: false

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

HIPAA, SOC2, and Your Agent: The Architectural Constraints Compliance Actually Imposes

What Compliance Actually Looks At

The Six Mistakes Teams Make Before the First Audit

Context windows retain PHI past task lifetime

Tool permissions are system-level, not operation-level

No audit trail for agent decisions

Ephemeral and durable memory use the same storage

Separation of duties violations in access management

No out-of-scope query detection

Architectural Patterns That Satisfy an Actual Audit

Separate ephemeral and durable memory explicitly

Operation-level tool permissions with metadata-driven dispatch

Recommended Reading

About Tian Pan

What Compliance Actually Looks At​

The Six Mistakes Teams Make Before the First Audit​

Context windows retain PHI past task lifetime​

Tool permissions are system-level, not operation-level​

No audit trail for agent decisions​

Ephemeral and durable memory use the same storage​

Separation of duties violations in access management​

No out-of-scope query detection​

Architectural Patterns That Satisfy an Actual Audit​

Separate ephemeral and durable memory explicitly​

Operation-level tool permissions with metadata-driven dispatch​

Recommended Reading

About Tian Pan

What Compliance Actually Looks At

The Six Mistakes Teams Make Before the First Audit

Context windows retain PHI past task lifetime

Tool permissions are system-level, not operation-level

No audit trail for agent decisions

Ephemeral and durable memory use the same storage

Separation of duties violations in access management

No out-of-scope query detection

Architectural Patterns That Satisfy an Actual Audit

Separate ephemeral and durable memory explicitly

Operation-level tool permissions with metadata-driven dispatch