Skip to main content

3 posts tagged with "document-ai"

View all tags

Why Vision Models Ace Benchmarks but Fail on Your Enterprise PDFs

· 9 min read
Tian Pan
Software Engineer

A benchmark result of 97% accuracy on a document understanding dataset looks compelling until you run it against your company's actual invoice archive and realize it's quietly garbling 30% of the line items. The model doesn't throw an error. It doesn't return low confidence. It just produces output that looks plausible and is wrong.

This is the defining failure mode of production document AI: silent corruption. Unlike a crash or an exception, silent corruption propagates. The garbled table cell flows into the downstream aggregation, the aggregation feeds a report, the report drives a decision. By the time you notice, tracing the root cause is archaeology.

The gap between benchmark performance and production performance in document AI is real, persistent, and poorly understood by teams evaluating these models. Understanding why it exists — and how to defend against it — is the engineering problem this post addresses.

Document AI in Production: Why PDF Demos Lie and Production Pipelines Don't

· 11 min read
Tian Pan
Software Engineer

A clean PDF, a capable LLM, and thirty lines of code. The demo works. You extract the invoice total, the contract dates, the patient diagnosis. Stakeholders are impressed. Then you push to production, and within a week the pipeline is silently returning wrong data on 15% of documents — and nobody knows.

This is the document AI trap. The failure mode isn't a crash or an exception; it's a pipeline that reports success while producing garbage. Building production document extraction is a fundamentally different problem from building a demo, and most teams don't realize this until they've already shipped.

Why Your Document Extractor Breaks on the Contracts That Matter Most

· 13 min read
Tian Pan
Software Engineer

Your invoice parser probably works fine. Feed it a clean, digital PDF from a Fortune 500 vendor — structured rows, consistent column widths, machine-generated text — and it will extract line items with near-perfect accuracy. Then someone uploads a multi-page contract from a regional supplier, a scanned form with handwritten amendments, or a financial statement where the table header lives on page 3 and the rows continue through page 6. The extractor fails silently, returns partial data, or confidently produces structured output that is wrong in ways no downstream validation catches.

This is the central problem with enterprise document intelligence: the documents that break your system are not the edge cases. They are the ones with the highest business value.