The Typo Your Agent Learned to Honor

May 31, 2026 · 10 min read

Software Engineer

An insurance carrier fine-tuned a support model on a year of chat transcripts. Within a week of launch, a compliance reviewer flagged something odd: the bot kept writing "deductable" instead of "deductible." Not occasionally — consistently, in roughly the same one-in-eight messages where the word appeared. The model had not invented the misspelling. It had inherited it. A handful of tier-1 reps had been typing it that way for two years, and the corpus reflected what they typed, not what the dictionary said.

This is the unsettling thing about supervised fine-tuning on operational data: the model is not learning your domain. It is learning your corpus. Those two things overlap, but they are not the same, and the gap is where every preventable behavioral defect lives. Frequency in your training data is not a signal of correctness. It is a signal of what your team happened to do enough times for the model to mimic it.

The misspelling is the easy case to spot. The hard cases are the ones nobody bothered to write down as rules, because everyone assumed the model would learn the "professional" version of the work rather than the actual work as performed.

Why frequency is not correctness

A pretrained model carries a prior. Dictionary spellings outnumber common misspellings on the open web by roughly the same ratio that careful editors outnumber harried support agents, so the base model writes "deductible" without thinking. Fine-tuning shifts that prior toward whatever you show it. Show the model "deductable" several hundred times in matched context, and the new posterior collapses onto the corpus-frequency answer, not the dictionary one.

The literature on data quality is uncomfortably clear on the size of this effect. Real-world datasets routinely carry annotation noise in the 7–50% range. One recent study on the threshold for emergent misalignment found that at least 50% correct data is required before domain-specific performance and moral alignment reliably recover — and even at 90% correctness, fine-tuned models often fail to match the robustness of their base. The model has no signal that distinguishes "a thing the corpus does often" from "a thing the corpus does correctly." It only sees frequency, and frequency wins.

This matters most precisely where corpora feel cleanest. Customer support transcripts, clinical notes, sales call summaries, code review comments — these are written by professionals doing real work, and they read as authoritative. They are also riddled with idiosyncratic abbreviations, in-house jargon, copy-pasted boilerplate, recurring typos, and one person's verbal tic propagated through a quote-reply chain. None of that gets flagged by an annotator pass because none of it is "wrong" relative to the work. It is exactly the work — including the parts that should not be modeled.

The idiolect-to-brand-voice pipeline

The deductable example is benign because it is visible. The same mechanism produces effects that no spellchecker will catch.

Consider a tier-1 rep who opens roughly a third of their responses with "honestly, I think this might be the issue." It is a verbal habit. It softens a recommendation that the rep is fairly confident about. It is also one rep's voice, not the company's. After a year of high-volume transcripts, that single rep's chat history dominates the most heavily-templated portion of the corpus — escalation acknowledgments, problem hypotheses, follow-up suggestions. Fine-tune on that corpus and "honestly, I think this might be the issue" becomes a stylistic fingerprint of your customer-facing AI. Every customer sees the same hedge, in roughly the same place, with the same false intimacy.

Multiply this across the thousand small choices a support team makes in a year. The rep who always offers a coupon before checking eligibility. The rep who answers billing questions with a clipped, three-sentence cadence. The rep who routinely promises "I'll have someone from engineering reach out" when the issue is a UI bug they cannot diagnose. The corpus does not annotate these as habits; they look like answers. Fine-tuning copies them down as templates.

Recent work confirms how cheaply this style transfer happens. A 2025 study found that fine-tuning a model on as few as 100 simulated conversations was enough to dominate prompt-based instructions about tone of voice. The mechanism is straightforward — gradient updates over a small number of examples on a narrow distribution outweigh a long system prompt that the decoder will partially ignore. The flip side is that 100 misleading examples are also enough. The threshold for accidentally inheriting a stylistic accident is shockingly low.

The curation pass most teams skip

The fix is not exotic. It is the one-time pre-fine-tuning step that asks a very specific question of every recurring pattern in the corpus: is this a domain convention worth preserving, or a stylistic accident worth normalizing?

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Typo Your Agent Learned to Honor

Why frequency is not correctness

The idiolect-to-brand-voice pipeline

The curation pass most teams skip

Recommended Reading

About Tian Pan

Why frequency is not correctness​

The idiolect-to-brand-voice pipeline​

The curation pass most teams skip​

Recommended Reading

About Tian Pan

Why frequency is not correctness

The idiolect-to-brand-voice pipeline

The curation pass most teams skip