Your Prompt Is Competing With What the Model Already Knows
The frontier model you just wired up has opinions about your competitors. It has a default answer to the hard question your product was built to disagree with. It has a "best practice" for your domain that came from whatever happened to dominate the training corpus, and a quiet preference for the conventional take on every controversial call your team agonized over in the design doc. None of that is in your system prompt. You did not write any of it. And on the queries where your differentiation actually lives, the model will reach for those defaults before it reaches for what you told it.
Most teams ship as if the model is a configurable blank slate. Write the persona, list the rules, paste the brand voice guidelines, run a few QA prompts that produce the right shape of answer, and call it done. The prompts that get reviewed are the prompts that hit easy queries — the ones where the model's prior happens to align with what you wanted anyway. The interesting queries, the ones where your product would lose badly if it produced the generic answer, almost never make it into the prompt-iteration loop. Those are the queries where the prior wins silently.
This is not a prompt-engineering skill issue. It is a structural one: the model arrived pre-loaded, you added a thin layer of instructions on top, and the layer is thinner than you think.
Pretraining Sets the Blueprint; Prompts Are a Stylistic Gloss
The empirical picture is unkind to the "just write a better prompt" instinct. Research separating the contribution of pretraining from instruction tuning has found that cognitive and stylistic biases are mostly formed during pretraining and survive even aggressive instruction-tuning regimes. Cross-tuning experiments — swapping instruction datasets between different base models to isolate where the bias actually lives — show that models retain their original tendencies. Instruction tuning behaves more like a stylistic gloss than a cognitive rewrite.
Studies of prompt-based debiasing land in the same place. Stereotypical terms keep higher probabilities than anti-stereotypes across most layers regardless of prompt type. Fairness prompts can reduce the magnitude of a bias but are usually insufficient to flip its ranking. The model knows what it wants to say, and the prompt nudges the saying without changing the wanting.
If you treat this as a finding about social bias only, you are missing the generalization. The same dynamic applies to anything the model has a confident prior about: which database is the "default" choice for a workload, what the conventional answer is to a contested architecture decision, which framework is "the modern way" to do a thing your team explicitly does differently, what counts as a "best practice" in your domain. The prior is whichever take had the most well-written examples in the training corpus. Your prompt is a few hundred tokens of contrary instruction stacked against millions of tokens of agreement with the conventional answer.
There is a real lever. Larger models can override pretraining priors when the in-context evidence directly contradicts them — the flipped-label experiments in the in-context-learning literature show this is an emergent capability of scale, not something small models can do at all. But the override requires concrete, structured contradiction in context, not a generic instruction telling the model to behave differently.
The Generic-Output Problem Hiding Inside Your "Differentiated" Feature
Here is the failure mode that turns this from an abstract concern into a strategy problem. Your team built a product on the premise of disagreeing with the conventional answer in your domain. The pitch deck explains the differentiation. The eng team built features around it. The prompt encodes it as a directive: "always recommend X over Y because of Z." Then a customer types the question that should produce the differentiated answer, and the model produces the same answer every other product on the same foundation model produces.
The reason is not that the prompt was bad. The reason is that the prompt is competing with the model's prior, the prior is strong, and the user query did not happen to include the lexical signals that route the model into your prompt's framing rather than its own. The output is plausibly on-brand-adjacent — it does not violate your rules — but it is structurally indistinguishable from what your competitors are shipping on the same model. The differentiation is a layer the model can route around when it feels like it.
Engineering teams often discover this only when a customer compares outputs side-by-side and notices the products feel the same. By then the feature is shipped, the marketing positioned around the differentiation, and the architectural change required to actually move the prior is much harder than it would have been before launch.
Probing the Prior Before You Write the Prompt
The first discipline that fixes this is also the cheapest: probe the base model on your differentiation questions before you add a single line of system prompt. You are trying to answer a specific question — "what does this model say about the things I plan to disagree with, when nothing is steering it?" — and the only way to answer it is to ask the model with no steering.
Build a small, deliberate test set:
- The questions where your product's answer should clearly differ from the conventional one
- The questions where your competitors and your product should diverge
- The questions whose answer encodes your team's strong opinion on a contested call
Run them through the bare model. No system prompt, no retrieval, no persona. Record the answers. These are the priors. They are what your prompt is competing against on every related query, even queries that look unrelated to the test case. If the base model's answer to "what is the right approach to X" is already what you wanted to say, you do not need a prompt for that — and your prompt's tokens are better spent elsewhere. If the base model's answer is the opposite of yours, you have just identified where every prompt iteration has to fight uphill, and how loudly.
Then add the system prompt and re-run the same set. Track override rate per question, not in aggregate. Aggregating loses the signal: a prompt that overrides on the easy half and loses on the differentiation half will still look "good" by average accuracy. The questions where the prior loses are the questions where your product is generic, regardless of how the score looks on average.
Repeat this on every model upgrade. A new foundation model is a new prior — sometimes friendlier to your framing, sometimes less so. Treating "we passed eval on the old model" as evidence that the new model also routes through your prompt is the kind of mistake that gets caught only after the upgrade rolls out.
What Actually Moves the Prior
Generic instructions do not move strong priors. The model treats "remember that X is better than Y" as a softer constraint than its own confident sense of how the world works. The patterns that actually move the prior are the ones that give the model concrete material to reason from, not just rules to follow.
In-context counterexamples. Show the model the answer that the prior produces, then show it the answer your product produces, with the reasoning behind the difference. Few-shot examples that explicitly include the conventional response and the corrected one teach the model what gap it needs to close. Examples that only show the right answer let the model compress them into its existing prior — it will read your example as confirmation that the conventional answer is fine, since "fine" is what it already believed.
Explicit refutation of the default. Name the default answer the model is going to want to give. Explain why it is wrong for this use case. Do this before stating what to do instead. The "name the prior, refute it, then state the alternative" structure works because it gives the model a place to put its existing belief and a reason to override it. Skipping the refutation leaves the prior live, and the model resolves the conflict in its favor.
Retrieval that grounds in your perspective, not the model's. RAG is often deployed as a freshness mechanism, but its more important job in this context is shifting the locus of authority away from the model's parametric knowledge. Studies of RAG behavior consistently show that parametric knowledge can override retrieved context — especially when the retrieval contradicts the model's training data — unless the system prompt explicitly conditions generation on the retrieved chunks. If you want your perspective to win, retrieve documents that argue your perspective and instruct the model to ground in them. Retrieving neutral documents and hoping the prompt re-frames them is the same losing fight as before.
Output structures that make the prior visible. Asking for a "default industry answer" field followed by a "our recommendation" field forces the model to externalize the prior rather than smuggle it in as the answer. Once the prior is on the page as text, it becomes legible to the model as something to push against.
There are heavier interventions — fine-tuning, preference optimization on counterexamples, advisor models that issue reactive steering instructions to a black-box base model — and they work. They are also expensive, harder to iterate on, and only worth reaching for once you have proven that prompt-level interventions cap out below what your product needs. Most teams have not yet done the cheap diagnostic work that would tell them whether they are at that cap.
The Org Failure Mode: Nobody Knows What the Base Model Thinks
The prior-override problem looks like a prompt problem, but it is usually an org problem. The prompt author is often someone who has never run the test set with no system prompt — sometimes because that workflow is not part of the prompt-iteration tooling, sometimes because the eval rig only reports accuracy on a generic QA set that has no overlap with the questions where differentiation lives, sometimes because there is a cultural assumption that the model is whatever the system prompt says it is.
The result is predictable. Product writes a spec describing differentiated behavior. Eng writes a prompt encoding that behavior. The prompt passes review because it reads like the spec. It ships. It produces the conventional answer on the queries that matter. Customer support starts seeing complaints framed as "your product gave me the same answer ChatGPT gives." The team rewrites the prompt, harder this time. The prior wins again, because the underlying intervention has not changed — only the tone of the instructions did.
The fix is procedural. Make "what does the base model say with no prompt?" a required artifact in every prompt change review. Track override rate on the differentiation questions as a first-class metric, alongside whatever generic accuracy number the eval rig reports. When a new base model is adopted, re-run the prior probe and surface the diff — "the new model agrees with us on these questions and disagrees on these others" — so the prompt edits target where the new prior is weak. None of this is expensive. It is mostly a matter of deciding the questions where you lose silently are worth measuring.
Stop Treating the Model as a Blank Slate
The mental model that ships generic features is the one that imagines the prompt as configuration applied to a neutral substrate. The model is not neutral. It arrived with strong opinions about your domain, formed by whatever was loud in its training data, and many of those opinions are precisely the ones your product was built to dispute. Pretending the substrate is blank means writing prompts that compete with priors you have never named and cannot see.
The teams that build genuinely differentiated products on shared foundation models are the ones that treat the prior as a first-class engineering object: probe it, measure it, refute it explicitly when it conflicts with their position, and re-probe whenever the model changes. The work is not glamorous and it does not show up well in a demo. It is the work that prevents your differentiation from being a layer the model routes around.
- https://research.google/blog/larger-language-models-do-in-context-learning-differently/
- https://www.nature.com/articles/s43588-024-00741-1
- https://cognaptus.com/blog/2025-07-13-bias-baked-in-why-pretraining-not-finetuning-shapes-llm-behavior/
- https://productstudioz.com/2025/10/23/the-ai-differentiation-paradox-mastering-outputs-in-the-age-of-commoditized-models/
- https://arxiv.org/html/2503.02989v1
- https://openreview.net/forum?id=gAzXUglTR6
- https://www.stackai.com/blog/prompt-engineering-for-rag-pipelines-the-complete-guide-to-prompt-engineering-for-retrieval-augmented-generation
- https://www.prompthub.us/blog/strategies-for-managing-prompt-sensitivity-and-model-consistency-
