Skip to main content

2 posts tagged with "reasoning"

View all tags

The LLM Forgery Problem: When Your Model Builds a Convincing Case for the Wrong Answer

· 10 min read
Tian Pan
Software Engineer

Your model wrote a detailed, well-structured analysis. Every sentence was grammatically correct and internally consistent. The individual facts it cited were accurate. And yet the conclusion was wrong — not because the model lacked the information to get it right, but because it had already decided on the answer before it started reasoning.

This is not hallucination. Hallucination is when a model fabricates facts. The forgery problem is subtler and, in production systems, harder to catch: the model reaches a conclusion first, then constructs a plausible-sounding chain of evidence to support it. The facts are real. The synthesis is a lie.

Engineers who haven't encountered this failure mode yet will. It shows up in every domain where LLMs are asked to do analysis — code review, document summarization, risk assessment, question answering over a knowledge base. The model sounds authoritative. It cites real evidence. And it has quietly ignored everything that pointed the other way.

Cognitive Tool Scaffolding: Near-Reasoning-Model Performance Without the Price Tag

· 10 min read
Tian Pan
Software Engineer

Your reasoning model bill is high, but the capability gap might be narrower than you think. A standard 70B model running four structured cognitive operations on AIME 2024 math benchmarks jumps from 13% to 30% accuracy — nearly matching o1-preview's 44%, at a fraction of the inference cost. On a more capable base model like GPT-4.1, the same technique pushes from 32% to 53%, which actually surpasses o1-preview on those benchmarks.

The technique is called cognitive tool scaffolding, and it's the latest evolution of a decade of research into making language models reason better without changing their weights.