The Subagent That Inherited a System Prompt It Should Not Have Seen

June 2, 2026 · 8 min read

Software Engineer

A planner agent receives a task, decomposes it, and spawns a researcher subagent to handle one of the branches. The orchestration framework propagates the parent's full context to the child because that is the easiest default to ship. The researcher now holds the planner's complete system prompt — the policy text, the names of internal tools, the credentials the parent was scoped to use, the few-shot examples that hint at how your billing pipeline is structured. The researcher's job was to read three documents. The blast radius of the call is the entire authority of the parent.

This is not a hypothetical. It is the default behavior of most multi-agent frameworks shipping in production today. A recent audit found that 93% of agentic projects use unscoped API keys, and that when one agent calls another, the child agent either inherits the parent's full credentials or receives its own independent key — with no project implementing scope narrowing, depth limits, or cascade revocation for delegated access. The framework treats "share parent state" as a convenience and "scope down the child" as opt-in. The opt-in step is the one nobody writes.

The reason this fails quietly is that the audit trail collapses. When the researcher subagent calls a tool, the trace identity is the parent's. The orchestration framework flattens the delegation chain at logging time because the parent owns the session and the child is treated as part of the parent's reasoning step. When the postmortem asks "which agent issued this database write," the answer is the workflow ID. The actual identity — the researcher running with a narrower role specification — was never persisted as a separate principal. The action looks like it came from the planner, which was authorized to do it. The system that approved the action approved the wrong thing.

Context Propagation Is an Implicit Privilege Grant

The first reframing is that propagating context to a subagent is not a UX decision. It is a privilege grant. The contents of a system prompt are not inert text — they are the operational authority of the agent that holds them. They contain tool names the model has learned to call. They contain few-shot trajectories that demonstrate how to construct privileged calls. They contain policy text that, when the child reasons about a task, becomes the child's policy text too.

When the planner's system prompt includes "you have access to the customer_refund tool with the following signature," the child that inherits that prompt has now learned that tool exists, what it does, and how to call it. The credential check at the tool layer might still refuse the call, but the model has been told about a surface it should not have been told about, and a sufficiently determined prompt injection at any later step can now reference that surface by name. The injection does not need to discover the tool. The parent's system prompt taught it.

This is why the "context drift" framing — popular in orchestration writeups — undersells the failure. The risk is not that the child reasons from inconsistent context. The risk is that the child reasons from context that grants it more authority than its role specification implies. Drift is a quality problem. Inherited authority is a security problem.

The Framework Default Is Backwards

Look at the configuration surface of the major frameworks. CrewAI passes task outputs sequentially and accepts an optional context parameter that includes outputs from prior tasks; the orchestration assumes you want the chain to share, and you must explicitly narrow it. AutoGen stores conversation history in-memory shared across agents in a GroupChat by default. LangGraph gives you checkpointing and explicit state control, which is better, but the explicit state is still expressed as "what flows through" rather than "what the child is allowed to see." Google's ADK exposes an include_contents knob on the callee that controls how much context flows from the root agent to a sub-agent — which is a real primitive, but its default is generous.

The pattern is the same across frameworks: the easy path is "share everything," the secure path requires you to enumerate. Defaults shape behavior, and the defaults here shape behavior toward leakage.

The reversal is straightforward to state and hard to ship: the child should receive a narrowed instruction set, not a copy of the parent's. The child's system prompt should be constructed at handoff time from a capability specification — what role does this child play, what tools does this role need, what policy text governs that role — and assembled fresh, not cloned and trimmed. Cloning and trimming leaves residue. Construction from scratch leaves nothing.

Capability-Based Handoff Is the Containment Pattern

The pattern that contains the failure has three pieces. First, the parent declares the child's capability — the role, the inputs the child needs, the tools the child is allowed to invoke, the data scopes the child is allowed to query. Second, the orchestration layer constructs the child's system prompt from that capability declaration, not from the parent's prompt. Third, the credential layer issues the child a token scoped exactly to those capabilities, time-limited to the duration of the subtask, with no fallback to the parent's credentials when the child encounters a tool it was not delegated.

The third piece is the one teams skip because it is the most invasive. It requires the credential layer to support per-subtask token issuance, which most internal credential layers do not. The shortcut — handing the child the parent's bearer token — is what creates the audit trail collapse described above. The token has the parent's identity baked in, so every call the child makes reads as the parent's call.

This is the "delegation beats impersonation" framing from the zero-trust agent literature. When the child impersonates the parent, the child holds the parent's authority and the audit trail loses the child's identity. When the child holds a delegated, scoped credential, the audit trail records "the parent delegated to the child, the child made this call, the call was within the delegated scope" — and the postmortem can attribute the action to the agent that actually ran it.

Trace Identity Must Survive the Handoff

The audit problem deserves its own treatment because it determines whether the security model is observable. The OpenTelemetry pattern — trace IDs that link all agent actions in a task, with parent-child span relationships representing the delegation chain — is the right primitive, but the way most agent frameworks wire it up loses the principal identity at the boundary. The trace shows a span tree. The spans do not record which agent identity made which call.

The fix is to treat each agent instance as a non-human identity (NHI) with its own credentials and to record, at every tool call, the chain: the originating human principal, the delegation path, the immediate agent identity making the call, the tool accessed, and the authorization scope in effect. When the postmortem reads the trace, it should be able to answer "who delegated to whom, with what scope, and which exact agent instance issued the action that caused the incident." If the trace cannot answer that question, the security model exists only in the system prompt — and the system prompt is the thing that leaked.

The corollary is that traces should be tamper-evident. If the agent itself writes the trace, the agent can be prompt-injected into writing a false trace. The audit trail belongs to the orchestration layer, not to any agent in the chain.

The Architectural Realization

Multi-agent orchestration is not a programming model. It is a privilege boundary. Every spawn is a delegation. Every context propagation is a privilege grant. Every shared credential is a confused-deputy waiting to happen. The teams shipping multi-agent systems today are mostly building them inside frameworks that treat orchestration as a convenience layer and security as something the application bolts on later. The framework defaults make that bolt-on impossible to retrofit after the fact, because the architectural decisions — who sees what, who can call what, who shows up in the audit trail — are baked in at the first spawn.

The work to do is not in the prompts. It is in the four mechanisms underneath: a handoff protocol that constructs the child's instruction set from a capability declaration; a credential layer that issues per-subtask scoped tokens and fails closed; a trace identity model that records the immediate principal at every step; and a default posture where "share context" requires the same explicit grant as "share credentials."

Teams that did not build these mechanisms are not running multi-agent systems. They are running one large agent whose reasoning is spread across several model calls, with no privilege boundaries between the calls, and an audit trail that cannot tell them apart. The first incident is the one that teaches them which of those framings was true.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Subagent That Inherited a System Prompt It Should Not Have Seen

Context Propagation Is an Implicit Privilege Grant

The Framework Default Is Backwards

Capability-Based Handoff Is the Containment Pattern

Trace Identity Must Survive the Handoff

The Architectural Realization

Recommended Reading

About Tian Pan

Context Propagation Is an Implicit Privilege Grant​

The Framework Default Is Backwards​

Capability-Based Handoff Is the Containment Pattern​

Trace Identity Must Survive the Handoff​

The Architectural Realization​

Recommended Reading

About Tian Pan

Context Propagation Is an Implicit Privilege Grant

The Framework Default Is Backwards

Capability-Based Handoff Is the Containment Pattern

Trace Identity Must Survive the Handoff

The Architectural Realization