Multi-Axis Agent Bisection: When the Regression Lives in the Interaction
Quality regressed overnight. The on-call engineer pulls up the dashboard, traces a few bad sessions, and starts the obvious bisection: the model provider rotated to a new snapshot at 02:00 UTC, so revert to the pinned older alias. Eval suite still red. Roll back yesterday's prompt change. Still red. Pin the retrieval index back to last week's version. Still red. Each owning team rolls back their own axis in isolation and reports "not us." Three hours in, nobody owns the diagnosis because nobody owns the interaction surface where the regression actually lives — the new model interpreting the new tool description in a way the old model never would have.
This is the failure mode single-axis tooling can't solve. git bisect works because the search space is one-dimensional: a linear sequence of commits. An agent doesn't have one timeline. It has four or five timelines running in parallel — model snapshot, system prompt, tool catalog, retrieval index, sampling config — each with its own owner, its own deploy cadence, and its own "rollback" button that returns just its axis to a known state. The regression you're chasing is often a two-factor interaction, and bisecting along any single axis returns false negatives because the bug only fires on the cross-product cell where the new model meets the new tool description.
