When AI Sounds Right but Isn't: LLM Confabulation in Technical and Scientific Domains
The insidious thing about LLM confabulation in technical domains isn't that the model produces obviously wrong answers. It's that the model produces beautifully structured, confidently stated, technically plausible answers that are subtly wrong in ways that only domain experts catch — and often only after the damage is done.
A Monte Carlo physics simulation that initializes correctly but resamples particle positions from scratch at each step rather than making incremental updates. A chemical formula that follows the right naming conventions but has an incorrect oxidation state. An engineering specification that cites the right standard, references the right units, and has exactly the wrong load coefficient. Each output looks right. Each sounds authoritative. Each is wrong in ways that won't surface until someone runs the experiment, stress-tests the component, or critically reads the derivation.
