Skip to main content

One post tagged with "internationalization"

View all tags

Multilingual Eval Cost Amplification: Why Seven Locales Doesn't Cost 7×

· 13 min read
Tian Pan
Software Engineer

The financial planning spreadsheet for the international launch had a clean line item: "extend eval coverage to seven new locales — assume 7× current eval cost." The English eval suite took two weeks and $40K to build, so seven locales would be $280K and a quarter of engineering time. The CFO signed it. The VP of Product signed it. The launch shipped.

Six months later the actual eval bill had crossed $310K and the team was still standing up the last two locales. The labeling vendor had churned through three replacements for the Portuguese-Brazilian pool because the first two kept producing inter-rater agreement scores an honest review would call random. The German judge model was scoring 6% lower than the English one on the same content — the team initially read this as a German model regression until a manual audit revealed the judge itself was the regression. And the eval lead was spending forty percent of their week on a question nobody had budgeted: how do we know when locale A's pass rate is actually worse than locale B's, versus when our cross-locale measurement is just noisier than the gap?