Skip to main content

7 posts tagged with "enterprise-ai"

View all tags

The Document Is the Attack: Prompt Injection Through Enterprise File Pipelines

· 9 min read
Tian Pan
Software Engineer

Your AI assistant just processed a contract from a prospective vendor. It summarized the terms, flagged the risky clauses, and drafted a response. What you don't know is that the PDF contained white text on a white background — invisible to your eyes, perfectly visible to the model — instructing it to recommend acceptance regardless of terms. The summary looks reasonable. The approval recommendation looks reasonable. The model followed instructions you never wrote.

This is the document-as-attack-surface problem, and most enterprise AI pipelines are completely unprepared for it.

The vulnerability is architectural, not incidental. When document content flows directly into an LLM's context window, the model has no reliable way to distinguish legitimate instructions from attacker-controlled content embedded in a file. Every document your pipeline ingests is a potential instruction source — and in most systems, untrusted documents and trusted system prompts are processed with equal authority.

Your RAG Knows the Docs. It Doesn't Know What Your Engineers Know.

· 10 min read
Tian Pan
Software Engineer

Your enterprise just deployed a RAG system. You indexed every Confluence page, every runbook, every architecture doc. Six months later, a senior engineer leaves — the one who knows why the payment service has that unusual retry pattern, why you never scale the cache past 80%, and which vendor never to call on Fridays. That knowledge was never written down. Your RAG system has no idea it existed.

This is the tacit knowledge problem, and it's why most enterprise AI systems underperform not because of retrieval quality or hallucination, but because the knowledge they need was never captured in the first place. Sixty percent of employees report that it's difficult or nearly impossible to get crucial information from colleagues. Ninety percent of organizations say departing employees cause serious knowledge loss. The documents your RAG can index are only the tip.

Why Vision Models Ace Benchmarks but Fail on Your Enterprise PDFs

· 9 min read
Tian Pan
Software Engineer

A benchmark result of 97% accuracy on a document understanding dataset looks compelling until you run it against your company's actual invoice archive and realize it's quietly garbling 30% of the line items. The model doesn't throw an error. It doesn't return low confidence. It just produces output that looks plausible and is wrong.

This is the defining failure mode of production document AI: silent corruption. Unlike a crash or an exception, silent corruption propagates. The garbled table cell flows into the downstream aggregation, the aggregation feeds a report, the report drives a decision. By the time you notice, tracing the root cause is archaeology.

The gap between benchmark performance and production performance in document AI is real, persistent, and poorly understood by teams evaluating these models. Understanding why it exists — and how to defend against it — is the engineering problem this post addresses.

Enterprise RAG Governance: The Org Chart Behind Your Retrieval Pipeline

· 11 min read
Tian Pan
Software Engineer

Forty to sixty percent of enterprise RAG deployments fail to reach production. The culprit is almost never the retrieval algorithm—HNSW indexing works fine, embeddings are reasonably good, and vector similarity search is a solved problem. The breakdown happens upstream and downstream: no document ownership, no access controls enforced at query time, PII sitting unprotected in vector indexes, and a retrieval corpus that diverges from reality within weeks of launch. These are governance failures, and most engineering teams treat them as someone else's problem right up until a compliance team, a security audit, or a user who received another tenant's data makes it their problem.

This is the organizational and technical anatomy of a governed RAG knowledge base—written for engineers who own the pipeline, not executives who approved the budget.

The Pretraining Shadow: The Hidden Constraint Your Fine-Tuning Plan Ignores

· 9 min read
Tian Pan
Software Engineer

Your team spent three sprints labeling 50,000 domain-specific examples. You ran LoRA fine-tuning on a frontier model. The eval numbers improved. Then a colleague changed the phrasing of a prompt slightly, and the model reverted to the behavior you thought you'd suppressed. That's not a dataset problem. That's the pretraining shadow.

The core insight that practitioners keep rediscovering: fine-tuning teaches a model how to talk in a new context, but it cannot rewrite what the model fundamentally knows or is inclined to do. The behaviors, biases, and factual priors encoded during pretraining are a gravitational field that fine-tuning orbits but rarely escapes.

Internal AI Tools vs. External AI Products: Why Most Teams Get the Safety Bar Backwards

· 8 min read
Tian Pan
Software Engineer

Most teams assume that internal AI tools need less safety work than customer-facing AI products. The logic feels obvious: employees are trusted users, the blast radius is contained, and you can always fix things with a Slack message. This intuition is dangerously wrong. Internal AI tools often need more safety engineering than external products — just a completely different kind.

The 88% of organizations that reported AI agent security incidents last year weren't mostly hit through their customer-facing products. The incidents came through internal tools with ambient authority over business systems, access to proprietary data, and the implicit trust of an employee session.

The Internal AI Tool Trap: Why Your Company's AI Chatbot Has 12% Weekly Active Users

· 8 min read
Tian Pan
Software Engineer

Your company spent six months building an internal AI chatbot. The demo was impressive — executives nodded, the pilot group loved it, and someone even called it "transformative" in a Slack thread. Three months after launch, you check the analytics: 12% weekly active users, and most of those are the same five people from the original pilot.

This is the internal AI tool trap, and nearly every enterprise falls into it. The tool works. The technology is sound. But nobody uses it, because you built a destination when you should have built an intersection.