The Slow Turn That Wasn't Yours: KV Cache Eviction Mid-Conversation
A conversation has been moving along on a single Claude session for forty minutes. Eleven turns, each averaging 800ms time-to-first-token, each cheap because the 28,000-token prefix is hitting the prompt cache. Turn twelve arrives and TTFT is 3.4 seconds. The transcript hasn't changed shape. The model didn't switch. The network is fine. Cached input tokens drop from 27,800 to 0. The next turn's prefill bill is paid in full, from the first token.
You go looking for the cause in your traces and find nothing that names it. There is no event in your logs labeled "another tenant's burst evicted you." The only honest reading of the spike is that some other customer's prompt, somewhere on the same GPU pool, made the scheduler decide your warm prefix was the cheapest thing to drop. You cannot replay the turn. You cannot prove the eviction. The cache state at that moment was a function of strangers' traffic, and that traffic is not in your trace because it was never yours to see.
