The Audit Trail Mismatch: When User, Agent, and Tool Each Have Different Logs
A regulator emails you a single question: did this user authorize this transaction? Six hours later, three engineers are in a chat trying to join the chat surface's conversation log to the planner agent's reasoning trace to the tool's API record. The chat log has a turn ID and the user-visible message but no tool call detail. The planner trace has a tool-invocation record with timestamps that drift from the chat log by several hundred milliseconds. The tool's log has the API call with its own correlation ID that appears nowhere in the agent's record. The downstream service's log has yet another ID with no link back. The team eventually reconstructs the answer by joining on user IDs and approximate timestamps, hopes nothing critical is off by a turn, and ships a PDF to legal.
This is the audit trail mismatch. Every layer's owner believes their logs are fine — and individually, they are. The joined view is the artifact that doesn't exist, and nobody owns its absence. The team only finds out it doesn't exist when an incident, a customer escalation, or a regulator forces the join.
The reason this slips is that "the agent" is a pipeline of independent systems, each with its own observability story and each instrumented by a different team at a different time. The chat surface ships a product log focused on user-visible content. The agent runtime ships a developer log focused on planner steps and prompt construction. The tool ships an API log focused on requests and responses. The downstream service ships an integration log focused on its own SLOs. Three of those four logs predate the agent by years, and none of them were designed to participate in a single end-to-end story.
The four logs that don't agree
To make the failure concrete, walk a single user action through a typical agent stack and look at what each layer captures.
A user types "send the Q2 report to my CFO" into a chat surface. The chat log records a turn ID, the user's text, the assistant's user-visible response, and a session ID. It does not record that the model emitted a tool call, and it cannot — the chat log was written before tool calls existed and treats them as opaque internal traffic.
The agent runtime log records a planner trace: the system prompt, the assembled context, the tool the planner picked, the arguments it constructed, and the response it summarized. It has its own request ID. The link back to the chat surface is "user ID + approximately the same timestamp," because the chat surface didn't pass a correlation header through.
The tool log records an API invocation: a method name, parameters, an HTTP status, latency, a tenant ID, and the tool's own correlation ID. The agent runtime called the tool through an SDK that doesn't propagate the runtime's request ID, so the tool log can be joined to the agent log only by tenant and timestamp.
The downstream service log — the email provider that actually sent the report — records a message ID, a recipient, an attachment hash, and a sent-at time. The tool sent the request through a vendor SDK that does not forward upstream IDs. The email provider stamps a fresh message ID at the boundary.
Four logs. Four ID schemes. Zero overlap. Joining them requires (user ID OR tenant ID) AND timestamp-with-tolerance — a query that returns the wrong row about as often as a flaky integration test, and silently. The audit story for "did this user authorize this transaction" is reconstructed by humans cross-referencing windows of activity, and it gets worse every time the user's session has more than one tool call in it.
Three facts that need to be in one record
The fix is not "more logging." Each layer is already logging plenty. The fix is to recognize that an agent action carries three distinct facts that compliance review needs to recover together, and to design the audit record around those facts rather than around what was easy for each subsystem to emit.
The three facts are: what the user asked for, what the agent decided to do about it, and what the tool actually executed. They are not the same statement. The user asked for "send the Q2 report to my CFO." The agent decided to call email.send with a particular attachment, recipient, and subject derived from context. The tool executed an HTTP POST that produced a message ID and was processed by a downstream provider that may or may not have honored every header.
Logs that capture only one of those facts (the user's text, or the planner's intent, or the API's effect) are individually true and collectively useless for answering an authorization question. A regulator cares about the join: the user requested X, the system inferred X', the world received X''. The delta between X and X'' is where the compliance question lives, and it cannot be detected when each layer's log only knows one of the three.
A workable audit record for an agent action is a single document that contains all three, written once at the moment the action completes and indexed by a transaction ID minted at the user-action boundary. The chat layer, the agent layer, the tool layer, and the downstream layer each log what they always logged — but they also each emit the transaction ID into their own log, and a dedicated audit pipeline assembles the three-fact record by reading the transaction's authoritative trace rather than joining four heterogeneous log streams after the fact.
- https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
- https://artificialintelligenceact.eu/article/12/
- https://artificialintelligenceact.eu/article/19/
- https://www.helpnetsecurity.com/2026/04/16/eu-ai-act-logging-requirements/
- https://tetrate.io/learn/ai/mcp/mcp-audit-logging
- https://www.loginradius.com/blog/engineering/auditing-and-logging-ai-agent-activity
- https://galileo.ai/blog/ai-agent-compliance-governance-audit-trails-risk-management
- https://www.getmaxim.ai/articles/ai-agent-audit-logs-full-visibility-over-tool-usage/
- https://scalardynamic.com/resources/articles/21-when-logs-lie-how-clock-drift-skews-reality-and-breaks-systems
- https://oneuptime.com/blog/post/2026-02-06-resolve-clock-skew-opentelemetry-distributed-traces/view
- https://www.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/ch03s07.html
- https://www.datadoghq.com/blog/llm-otel-semantic-convention/
- https://langwatch.ai/blog/trace-ids-llm-observability-and-distributed-tracing
- https://portkey.ai/blog/the-complete-guide-to-llm-observability/
