The Local-Maximum Trap in Prompt Iteration: How to Tell You're Tweaking the Wrong Thing
There is a moment, six weeks into a serious LLM project, where the prompt iteration log starts to look like a therapy journal. Each tweak swaps one failure mode for another. Add a stricter "do not" clause and the model becomes evasive on cases it used to handle. Soften the tone and a different category of hallucination returns. The eval scoreboard hovers in a band three or four points wide, refusing to break out. Someone says, "let me try one more reordering," and another half day evaporates.
This is the local-maximum trap. The team is climbing a hill, but the hill does not go higher. The cruel part is that the hill is real — every prompt change does produce a measurable delta on some subset of cases, which is exactly the signal that keeps everyone tweaking. What's missing is the recognition that the ceiling above is not a prompt ceiling at all.
