The Words You Choose in Your System Prompt Change What Your Agent Will Risk

May 4, 2026 · 8 min read

Software Engineer

Here is something that shouldn't be surprising but is: when you tell an agent "avoid making mistakes" versus "prioritize accuracy," you are not giving it the same instruction. The observable behavior at ambiguous decision points diverges measurably — agents prompted with loss-avoidance framing hedge more, escalate more, and complete fewer tasks end-to-end. Agents prompted with gain-seeking framing complete more tasks but introduce more errors. The difference isn't philosophical; it shows up in eval logs.

This is the behavioral economics of agents, and most engineering teams haven't thought about it systematically. They write system prompts as documentation — a description of what the agent is — when system prompts are actually decision-shaping instruments that encode a risk posture whether the author intended that or not.

Framing Effects Are Not a Human Quirk — LLMs Inherit Them from Training Data

Behavioral economics established decades ago that humans are not neutral evaluators of equivalent choices. Tell someone they can save 200 of 600 lives (gain frame) versus that 400 of 600 will die (loss frame), and they make different choices even though the outcomes are identical. This is the framing effect, and it is not rational in the classical sense.

LLMs inherit this asymmetry. Research examining how models respond to gain-frame versus loss-frame variants of identical objectives finds that linguistic orientation exerts stronger influence on choice distribution than logical equivalence. Models tend toward deterministic, risk-averse options under positive framing ("gain X") and toward cooperative but less task-completing behavior under negative framing ("avoid losing X"). GPT-4o in the loss domain shows significantly stronger risk-seeking tendencies than humans — an inversion of the cautious behavior most engineers assume negative framing produces.

One pattern is particularly counterintuitive: models find answering "yes" harder than "no." Under uncertainty, they are biased toward refusal and negation. This means a system prompt saturated with loss-frame language ("never skip steps," "avoid assumptions," "do not proceed unless certain") can produce an agent that is systematically more likely to bail out on ambiguous tasks — not because it was explicitly told to give up, but because the accumulated negative framing primes a refusal-under-uncertainty posture.

The Same Agent, Different Risk Budgets

Consider two formulations of the same core instruction:

Loss frame: "Avoid generating incorrect information. Never proceed when you are unsure."
Gain frame: "Prioritize producing useful, accurate responses. Proceed when you have sufficient confidence."

These feel like equivalent caution. They produce different agents. The loss-frame version shows higher escalation rates on borderline cases, lower task completion on ambiguous inputs, and more hedge language in outputs. The gain-frame version completes more tasks but introduces more errors at the decision boundary.

The difference is not arbitrary. Research on how motivational framing in system prompts affects agent debugging depth found that trust-based framing ("You are thorough; find what you can") induced deeper investigation, while fear-based framing ("Avoid missing obvious issues") caused agents to pattern-match against known categories and stop earlier. The same mechanic applies to any agent that faces open-ended decision points: trust frames encourage exploration; fear frames encourage conservative pattern completion.

What makes this operationally significant is that most production system prompts are written under time pressure by engineers who are trying to prevent known failure modes. The natural instinct is to enumerate prohibitions. "Do not do X. Never do Y. Avoid Z." This is a loss-frame prompt by construction, and it accumulates a risk posture that may be much more conservative than intended — one that erodes task completion rates without proportional safety improvement.

Anchoring Compounds the Effect Over a Conversation

Framing effects are worse in multi-turn agents than in single-call completions because of anchoring. The initial framing of a system prompt anchors agent behavior throughout the conversation, and that anchor persists even when subsequent instructions pull in a different direction.

Research on anchoring bias in LLMs confirms that both models and humans weigh the initial context of prompts disproportionately — but unlike humans, simple mitigation strategies like chain-of-thought reasoning and reflection are insufficient to remove the anchor. The model reasons within the anchored frame rather than escaping it.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Words You Choose in Your System Prompt Change What Your Agent Will Risk

Framing Effects Are Not a Human Quirk — LLMs Inherit Them from Training Data

The Same Agent, Different Risk Budgets

Anchoring Compounds the Effect Over a Conversation

Recommended Reading

About Tian Pan

Framing Effects Are Not a Human Quirk — LLMs Inherit Them from Training Data​

The Same Agent, Different Risk Budgets​

Anchoring Compounds the Effect Over a Conversation​

Recommended Reading

About Tian Pan

Framing Effects Are Not a Human Quirk — LLMs Inherit Them from Training Data

The Same Agent, Different Risk Budgets

Anchoring Compounds the Effect Over a Conversation