Browser-Native AI Is a Per-Feature Decision: Four Axes Your Team Hasn't Priced
The model-in-the-tab story used to be easy to dismiss: small models, novelty demos, a cute Whisper transcription that ran for thirty seconds before the laptop fan turned on. That story is dead. Quantization improved, WebGPU shipped in every major browser, on-device caches got a persistent quota, and 4-bit 3B models now stream tokens at a rate users perceive as "snappy" on a $500 laptop. The "should this run server-side?" question is no longer a default — it is a load-bearing architectural decision your product team is making by accident every time they accept the platform team's first answer.
The mistake that follows is bigger than the demo getting worse. Teams pick one backend — usually server inference, sometimes browser inference — for the entire product, and then pay the wrong tax on every feature that doesn't fit. The privacy-sensitive feature loses to the latency-sensitive one because the architecture forces a single answer. Or worse, the team picks browser-native because the demo was magical, then ships a fleet experience where 30% of users on the long-tail device population get a degraded product the dashboard can't see.
Browser-native AI is not a faster TensorFlow.js. It is a different runtime with a different SRE story, a different cost model, and a four-axis trade-off that does not collapse into a single answer. Treating it as "the cheap version of the API call" is the architectural mistake of 2026.
