Skip to main content

3 posts tagged with "audit"

View all tags

The Compliance Audit That Asked Which Model Produced Which Output

· 10 min read
Tian Pan
Software Engineer

The auditor's question sounds simple. She has your appeals log open, points at a row from eight months ago, and asks which model decided that case. Your engineer pulls up the schema: there is a model column, and every decision in the audit window says v1. Then someone from the platform team mentions, almost in passing, that the alias behind v1 rotated four times during the audit period — a base model upgrade, a fine-tune refresh, a vendor-side capacity move, and one rollback that lasted six hours during an incident. The honest answer is that you cannot say which checkpoint produced that decision. The auditor writes something down. That phrase is not a regulator-acceptable answer, and you have just learned that the system you shipped has been failing an audit requirement it was never designed to meet.

The gap here is not a missing log line. The gap is between two different ideas of what "model" means. To the engineers shipping the system, v1 is an endpoint — a stable contract callers can point at while the thing behind it gets upgraded for free. To the auditor, "the model that produced this decision" is a specific artifact: a weight checkpoint, a hash, a thing you could in principle re-run on the same input and get a defensibly similar output. Endpoint aliases were invented to hide checkpoint rotation from callers. Audit-grade provenance demands the opposite — that every decision be attributable to exactly the checkpoint that produced it. The two ideas were on a collision course from the start; the audit just happened to be where they met.

The Ghost Employee in Your Audit Log: Agents With Borrowed Credentials Break IAM

· 10 min read
Tian Pan
Software Engineer

Pull up your SSO logs from this morning. Every Slack message, every GitHub PR, every calendar invite, every CI run, every Jira comment your AI agent produced — they all show the same thing the human-typed events show: a person's name, a session token, a green "successful authentication" line. Forensically, you have no way to tell which actions came from a human and which came from an agent the human launched and walked away from. That is the ghost employee problem, and almost every team that shipped agents in the last twelve months has it.

The shortcut that creates the problem is structural, not negligent. When you wire an agent into a tool, the easiest credential is the one already in the engineer's environment — their personal access token, their OAuth session, their device-bound SSO cookie. The alternative is a platform project: provision a first-class identity, federate it across every downstream service, wire it into the audit pipeline, build per-instance revocation. None of that ships in a sprint, and none of it shows up on a feature roadmap. So the agent borrows.

The Contestability Gap: Engineering AI Decisions Your Users Can Actually Appeal

· 11 min read
Tian Pan
Software Engineer

A user opens a chat, asks for a refund, gets "I'm sorry, this purchase is not eligible for a refund," closes the tab, and never comes back. Internally, the agent emitted a beautiful trace: tool calls, intermediate reasoning, the policy bundle it consulted, the model version it ran on. Every span landed in the observability platform. None of it landed anywhere the user could reach. There is no button labeled "ask a human to look at this again," and even if there were, there is no service behind it. The decision is final by default, not by design.

This is the contestability gap, and it is the next thing regulators, lawyers, and angry users are going to rip open. It is also one of the cleanest examples of a problem that looks like policy from the outside and turns out to be plumbing on the inside.