Skip to main content

Your Prompt Is Competing With What the Model Already Knows

· 11 min read
Tian Pan
Software Engineer

The frontier model you just wired up has opinions about your competitors. It has a default answer to the hard question your product was built to disagree with. It has a "best practice" for your domain that came from whatever happened to dominate the training corpus, and a quiet preference for the conventional take on every controversial call your team agonized over in the design doc. None of that is in your system prompt. You did not write any of it. And on the queries where your differentiation actually lives, the model will reach for those defaults before it reaches for what you told it.

Most teams ship as if the model is a configurable blank slate. Write the persona, list the rules, paste the brand voice guidelines, run a few QA prompts that produce the right shape of answer, and call it done. The prompts that get reviewed are the prompts that hit easy queries — the ones where the model's prior happens to align with what you wanted anyway. The interesting queries, the ones where your product would lose badly if it produced the generic answer, almost never make it into the prompt-iteration loop. Those are the queries where the prior wins silently.

This is not a prompt-engineering skill issue. It is a structural one: the model arrived pre-loaded, you added a thin layer of instructions on top, and the layer is thinner than you think.

Pretraining Sets the Blueprint; Prompts Are a Stylistic Gloss

The empirical picture is unkind to the "just write a better prompt" instinct. Research separating the contribution of pretraining from instruction tuning has found that cognitive and stylistic biases are mostly formed during pretraining and survive even aggressive instruction-tuning regimes. Cross-tuning experiments — swapping instruction datasets between different base models to isolate where the bias actually lives — show that models retain their original tendencies. Instruction tuning behaves more like a stylistic gloss than a cognitive rewrite.

Studies of prompt-based debiasing land in the same place. Stereotypical terms keep higher probabilities than anti-stereotypes across most layers regardless of prompt type. Fairness prompts can reduce the magnitude of a bias but are usually insufficient to flip its ranking. The model knows what it wants to say, and the prompt nudges the saying without changing the wanting.

If you treat this as a finding about social bias only, you are missing the generalization. The same dynamic applies to anything the model has a confident prior about: which database is the "default" choice for a workload, what the conventional answer is to a contested architecture decision, which framework is "the modern way" to do a thing your team explicitly does differently, what counts as a "best practice" in your domain. The prior is whichever take had the most well-written examples in the training corpus. Your prompt is a few hundred tokens of contrary instruction stacked against millions of tokens of agreement with the conventional answer.

There is a real lever. Larger models can override pretraining priors when the in-context evidence directly contradicts them — the flipped-label experiments in the in-context-learning literature show this is an emergent capability of scale, not something small models can do at all. But the override requires concrete, structured contradiction in context, not a generic instruction telling the model to behave differently.

The Generic-Output Problem Hiding Inside Your "Differentiated" Feature

Here is the failure mode that turns this from an abstract concern into a strategy problem. Your team built a product on the premise of disagreeing with the conventional answer in your domain. The pitch deck explains the differentiation. The eng team built features around it. The prompt encodes it as a directive: "always recommend X over Y because of Z." Then a customer types the question that should produce the differentiated answer, and the model produces the same answer every other product on the same foundation model produces.

The reason is not that the prompt was bad. The reason is that the prompt is competing with the model's prior, the prior is strong, and the user query did not happen to include the lexical signals that route the model into your prompt's framing rather than its own. The output is plausibly on-brand-adjacent — it does not violate your rules — but it is structurally indistinguishable from what your competitors are shipping on the same model. The differentiation is a layer the model can route around when it feels like it.

Engineering teams often discover this only when a customer compares outputs side-by-side and notices the products feel the same. By then the feature is shipped, the marketing positioned around the differentiation, and the architectural change required to actually move the prior is much harder than it would have been before launch.

Probing the Prior Before You Write the Prompt

The first discipline that fixes this is also the cheapest: probe the base model on your differentiation questions before you add a single line of system prompt. You are trying to answer a specific question — "what does this model say about the things I plan to disagree with, when nothing is steering it?" — and the only way to answer it is to ask the model with no steering.

Build a small, deliberate test set:

  • The questions where your product's answer should clearly differ from the conventional one
  • The questions where your competitors and your product should diverge
Loading…
References:Let's stay in touch and Follow me for more thoughts and updates