Temperature Governance in Multi-Agent Systems: Why Variance Is a First-Class Budget
Most production multi-agent systems apply a single temperature value—copied from a tutorial, set once, never revisited—to every agent in the pipeline. The classifier, the generator, the verifier, and the formatter all run at 0.7 because that's what the README said. This is the equivalent of giving every database query the same timeout regardless of whether it's a point lookup or a full table scan. It feels fine until you start debugging failure modes that look like model errors but are actually sampling policy errors.
Temperature is not a global dial. It's a per-role policy decision, and getting it wrong creates distinct failure signatures depending on which direction you miss in.
