The Trace Timeline Whose Timestamps Were Stamped by the Client Clock, Not the Gateway
You opened the trace for a slow conversation. The model call started 800 milliseconds before the user pressed send. You blamed the user's laptop, closed the tab, moved on.
That is not one user with a bad clock. That is roughly a third of your traffic, and every debug session that crosses the client boundary is reading a timeline that does not exist. Browser clocks are user-settable, frequently unsynchronized, and occasionally wrong by days. The instrumentation SDK that ships with most observability stacks stamps client spans with whatever the device reports, links them by traceparent ID into a tree with server spans stamped by a synchronized server clock, and hands the result to your on-call engineer as if the two halves were comparable. They are not.
The bug here is not that some clocks drift. The bug is that distributed-systems fundamentals — clocks lie, ordering requires explicit synchronization, the authoritative time source is the one you control — were imported into AI observability stacks without their accompanying caveats. Single-process tracing taught a generation of engineers to read a flame graph as if it were ground truth. When the spans came from one process, it was. The moment the first span originates in an unmanaged browser, the flame graph becomes a hypothesis dressed up as a fact.
The Failure Mode Is Inherited, Not Invented
Conversational AI systems span more trust boundaries than the systems the tracing patterns were originally designed for. A single user turn typically touches the browser, an edge worker or CDN, an API gateway, a model provider, a vector store, several internal tools, and a database. Every one of those is in a different administrative domain. The browser's clock is owned by the user. The provider's clock is owned by the provider. Your gateway is the only clock in the path you can authoritatively trust.
The standard instrumentation pattern emits a span at each layer with a start_time and end_time from whatever clock the layer happens to have. The OpenTelemetry JavaScript SDK in the browser uses performance.now() as a monotonic source for durations but anchors it to Date.now() for wall-clock timestamps, which means the anchor is exactly as wrong as the user's system clock. After the device sleeps overnight, the anchor doesn't refresh and timestamps continue to drift; there is a long-standing issue thread in the SDK repo documenting spans that materialize hours or days in the past.
The post-hoc fix most backends ship — clock-skew adjustment that reshuffles child spans into their parent's window — was named "considered harmful" by Jaeger maintainers themselves. The adjuster assumes parent and child spans have similar durations, which is true for tight RPCs and almost never true for an LLM generation. It hides the underlying problem by silently rewriting timestamps. It depends on undocumented host-identification tags and falls back to compensating nearly all spans when the tags are missing. The on-call engineer reading the cleaned-up trace has lost the signal that the clocks were wrong in the first place.
Conversational Traces Make the Problem Worse
Three things about LLM workloads make this failure mode bite harder than it does in classical request-response tracing.
Spans are long enough for skew to be material. A normal RPC trace is dominated by sub-millisecond network hops where a 50ms client clock drift is a rounding error. An agent turn is dominated by a multi-second LLM call followed by tool invocations measured in hundreds of milliseconds each. A 50ms drift in the client-stamped "user pressed send" span is invisible inside a 4-second model call but becomes a debugging tarpit when you're trying to reason about time-to-first-token latency at the same granularity.
Causality is harder to read out of the structure alone. In a synchronous RPC tree, the parent-child relationship is enforced by the call graph: parent must start before child, child must end before parent. Agent traces violate this constantly. A speculative tool call kicked off in parallel with model generation has no clean parent-child relationship to the user span. A second-turn assistant response is a child of the previous turn in the conversation but a sibling of it in the trace. The structural backstop the clock-skew-adjuster algorithms rely on doesn't exist.
- https://github.com/open-telemetry/oteps/issues/154
- https://github.com/open-telemetry/opentelemetry-js/issues/1728
- https://github.com/open-telemetry/opentelemetry-js/issues/3279
- https://github.com/open-telemetry/opentelemetry-js/issues/2705
- https://github.com/jaegertracing/jaeger/issues/1459
- https://www.jaegertracing.io/docs/2.15/deployment/configuration/
- https://www.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/ch03s07.html
- https://opentelemetry.io/docs/concepts/signals/traces/
- https://opentelemetry.io/docs/languages/js/getting-started/browser/
- https://en.wikipedia.org/wiki/Real_user_monitoring
