The Prompt Graph Inside Your Agent: Cross-Prompt Regression Chains Nobody Mapped
A senior engineer ships a four-word edit to the planner prompt — "if uncertain, ask first." The planner's own eval set, which grades whether plans are reasonable, moves up by half a point. They merge. Two weeks later, the verifier's eval shows a three-point pass-rate regression and nobody can repro it. The root cause turns out to be that the planner now asks more clarifying questions, the executor receives shorter task descriptions on the second turn, the verifier's rubric was implicitly tuned against the previous executor's longer outputs, and an edit nobody flagged as risky has shifted three downstream distributions at once.
This is what happens when you treat the prompts inside an agent as a flat folder of files instead of as a graph with edges. The prompts have owners. The edges between them have nobody.
