The Supervisor Agent That Rubber-Stamped Its Subagent Because They Shared a Prompt Template
A team I talked to last month was proud of a number: their supervisor agent approved 97% of its subagents' plans on first review. They read that as "the subagents are competent." A red-team review six weeks later read it as "the supervisor and the subagents are the same evaluator scoring its own output." Both readings fit the data. Only one of them was load-bearing in production.
The supervisor-reviews-subagent pattern is the most common shape multi-agent systems take in 2026 — somewhere around 70% of production deployments, including most of the reference designs the big labs publish. It looks like a check on paper. A planner decomposes the task, specialist workers produce plans, a supervisor reviews each plan before authorizing execution. Separation of concerns, clean audit trail, the works. The problem is that if you build the supervisor and the subagents from the same base prompt template — even with role-specific addenda differing by a paragraph — you have not built a check. You have built a system whose review step is an artifact of the same model agreeing with itself.
Why a 97% Approval Rate Should Set Off Alarms, Not Confetti
Start with the number itself. A 97% approval rate is consistent with two very different worlds: one where the subagents are excellent, and one where the reviewer cannot tell good plans from bad. The data alone cannot distinguish them. You need a separate signal — a calibrated baseline — to know which world you are in. Most teams never produce that signal, so they read the high approval rate as a quality metric and ship.
The trap is that the high rate feels like evidence. It comes from a structurally independent review step — a different process, a different invocation, a different role label. The independence is real at the orchestration layer. It is not real at the layer that matters: what the model considers a good plan. Both the proposer and the reviewer share a model, share a training distribution, and share most of a prompt. They are not two judges. They are the same judge looking at the same problem twice.
Recent research on multi-agent committees has a name for the shape this takes inside the system. Across 100 GSM8K questions with three agents instantiated from the same base model under different prompts, the mean cosine similarity of their reasoning was 0.888. The authors call it representational collapse. Different roles, same internal trajectory. Whatever the proposer thinks is a good answer, the reviewer thinks is a good answer, because they are running the same kind of computation on the same input.
The separate literature on LLM-as-a-judge identifies the dual of this from the evaluation side. When a model judges its own outputs, it favors them in ways that do not reduce to genuine quality differences — a self-preference bias linked to self-recognition that persists across architectures. Provider-level family bias compounds the individual effect: a Claude judge on Claude outputs systematically overscores in a way the per-model bias data does not capture, and switching the judge to a different provider closes most of the gap. Your supervisor is doing the same thing on the orchestration layer, except you are not measuring it.
The Plan Space the Subagents Optimize Toward Is the Plan Space the Supervisor Rewards
The mechanism is worth being explicit about. The subagent generates plans by sampling from a model. Conditional on a prompt, the model concentrates probability mass on regions of plan-space that the prompt makes likely. The supervisor scores plans by sampling from the same model, conditional on a near-identical prompt. The supervisor's scoring function is the model's likelihood under the supervisor's prompt. The subagent's plan distribution is the model's likelihood under the subagent's prompt. Two distributions over the same plan-space, derived from the same parameters, conditioned on prompts that share most of their text.
A plan that scores well under the supervisor is, in expectation, a plan the subagent was likely to generate. The subagent's plans land in high-supervisor-likelihood regions of plan-space not because they are good but because they are typical of what this model produces under prompts of this shape. The supervisor's review is not an independent check. It is the same model agreeing with what it already said. The 97% approval rate is not a measurement of quality. It is a measurement of how much prompt mass the two roles share.
This is why "we used a different system prompt for the supervisor" does not save you. The role addendum changes a paragraph. The rubric the supervisor uses, the formatting it expects, the failure modes it knows to look for — all the bulk of the prompt is shared, because you wrote the template once and specialized it twice. The interior of the model's decision is dominated by the shared portion. The role addendum is a hat the same evaluator is wearing.
What an Actually Independent Review Looks Like
- https://arxiv.org/pdf/2511.09710
- https://arxiv.org/html/2503.13657v1
- https://aclanthology.org/2025.emnlp-main.86/
- https://arxiv.org/abs/2410.21819
- https://arxiv.org/pdf/2604.03809
- https://arxiv.org/pdf/2509.05396
- https://www.adaline.ai/blog/llm-as-a-judge-reliability-bias
- https://www.openlayer.com/blog/post/multi-agent-system-architecture-guide
- https://newsletter.systemdesign.one/p/multi-agent-system
