Skip to main content

When AI Features Create Moats (and When They Don't)

· 9 min read
Tian Pan
Software Engineer

A leaked internal Google memo put it plainly: "We aren't positioned to win this arms race and neither is OpenAI." The author's argument was that fine-tuning a model with LoRA costs roughly $100, that open-source communities could replicate closed-model capabilities within months, and that "we have no moat." This was a Google researcher writing about Google. If that's true inside the world's best-resourced AI lab, what does it mean for your product team betting on a data advantage?

The honest answer is that most AI features are not moats. They are rented capabilities with a UI. But some genuinely compound — and the difference is not about how much data you have. It's about the specific mechanical conditions under which data actually creates defensibility.

Why Data Network Effects Fail in LLM Products

In traditional ML, the data network effect logic was: more users → more data → better model → more users. It compounded. The canonical example was Google Search: every query made the next one better. The argument for owning data felt airtight.

LLM products have broken this loop in two places.

First, foundation model providers now absorb most of the training-data value. When OpenAI trains GPT-5 or Anthropic trains Claude 4 on internet-scale data, they extract the general reasoning capability that your fine-tuning used to be responsible for producing. Your customer's proprietary data only moves the needle at the margin. The research is unambiguous: companies with real-time feedback loops on domain-specific signals maintain 5+ year defensibility windows, while companies with static proprietary datasets face 12-18 month vulnerability windows before the next model generation closes the gap.

Second, the data diminishing-returns problem hits faster than most teams expect. Andreessen Horowitz documented a customer-support chatbot that received zero incremental benefit after collecting 40% of its query distribution. The asymptote arrived early. Most product teams mistake initial fine-tuning gains — which are real — for compounding advantages, which require far more specific conditions.

The result is that application-layer AI companies are currently seeing gross margins as low as 50-60%, driven by inference costs. The margin extracted by the infrastructure layer reflects where the moat actually lives.

The Four Signals That Distinguish a Moat from a Wrapper

Given this landscape, how do you evaluate whether your AI feature is compounding or just riding the API wave? Four signals reliably predict which:

Signal 1: Your training data cannot be generated without running your business.

This is the most important signal and the rarest. Veeva Systems has 20+ years of pharmaceutical rep call notes, sample tracking logs, and physician engagement history. Palantir trains on classified government and intelligence data. Harvey AI, embedded deeply inside law firms, accumulates "process data" — not just legal answers, but the workflow trace of how experienced partners move through complex matters. None of that data exists anywhere else. You cannot buy it from a broker or approximate it with synthetic generation.

If your training data can be scraped from the web, purchased from a data vendor, or reconstructed from public sources, it is not a structural moat. Synthetic data already outperforms real data on a growing class of training tasks; data exclusivity is only defensible where the data encodes behaviors that cannot be synthesized.

Signal 2: Your switching costs are operational, not habitual.

There is a practical test: can you describe your product's core function to Claude in a sentence and get an 80% solution in 30 seconds? If yes, you are a feature dressed as a business. Habitual switching costs — users are comfortable, they know the interface — are brittle. Any sufficiently better product breaks them in weeks.

Operational switching costs are durable because replacing the tool means replacing the process. Salesforce has over 3,000 AppExchange integrations built by customers who are themselves now part of the Salesforce ecosystem. Workday is embedded in compliance, audit, and payroll runs. ServiceNow owns the ITSM ticket history that auditors require. These systems are not replaced by a better demo; they require re-architecting surrounding workflows, retraining staff, migrating years of structured records, and often regulator approval.

The question to ask about your AI feature: if a better alternative appeared tomorrow, what would it cost a customer to switch? A different conversation history is not a switching cost. A system-of-record with three years of audit trail, regulator trust, and 40 downstream integrations is.

Signal 3: Each additional deployment generates training signal your competitors cannot access.

This is the data flywheel working as intended — but it requires a closed feedback loop with specific properties. GitHub Copilot collects accept/reject signals on code completions; the signal is precise, interpretable, and directly applicable to model improvement. A chatbot collecting five-star ratings at conversation end provides almost no usable signal.

The loop also needs to run fast. Systems that complete the deploy-collect-retrain cycle in near-real-time compound faster than teams running quarterly retraining jobs. And critically, each enterprise deployment must generate signal that is unavailable to competitors — not just usage telemetry, but proprietary behavioral data about how your specific class of customer solves their specific class of problem. Harvey AI's deployments at elite law firms generated legal reasoning traces that no other AI company was present to observe. That is cumulative exclusion from the training set.

Signal 4: Your advantage is operational sophistication, not just the model.

Harvey AI's reported hallucination rate of approximately 0.2% is not a foundation model capability. It is an engineering achievement requiring factual claim decomposition, cross-referencing, custom legal embeddings (trained on 20 billion tokens of specialized text), and multi-model orchestration. That work took years to tune and validate against adversarial legal scrutiny. A competitor cannot replicate it by switching their API provider to a better base model.

This is the "effort gap" moat. Reaching 99% reliability on a complex vertical task — the reliability level that survives integration into regulated workflows — requires accumulated engineering investment that cannot be replicated quickly. The moat is not the model; it is the decade of iteration required to find and close every failure mode in your domain.

Where Moats Actually Live: A Spectrum

Ranking the defensibility mechanisms from weakest to strongest helps calibrate investment decisions:

  • API wrapper — No moat. Replicable in days.
  • Fine-tuned model on proprietary static data — 12-18 month defensibility. Next model generation closes most of the gap.
  • Workflow integration with real switching costs — Years, not months, to replace. Requires process redesign.
  • Active closed-loop data flywheel — Compounds with usage, but only if the loop closes with high-quality, domain-specific signal.
  • Compliance and regulatory lock-in plus data gravity — Replacement measured in years. Audit trail, certification, and regulator trust cannot be ported.
  • Network effects on proprietary data contributions — Strongest. Rare in LLM products today. The credit bureau model: participants must contribute to access.

The honest diagnosis for most AI features is that they live in the first two categories. That is not a failure — a well-executed API wrapper business can be highly profitable. But calling it a moat misleads the team about what they are actually building and where the risk lives.

When Interaction Data Does Compound

The conditions for a genuine data flywheel are narrow but achievable. You need three things simultaneously:

A feedback signal that is domain-specific and interpretable. Not session length. Not thumbs up/down on a conversation. Signals like "code accepted, code rejected, test passed, test failed" or "lawyer revised this paragraph, lawyer accepted this paragraph" are interpretable as corrections. Generic engagement metrics do not train anything useful.

A loop fast enough that the model improves before the distribution shifts. User behavior and product context drift. A retraining cycle that runs monthly on a product changing weekly will always be behind. The flywheel stalls when feedback quality degrades — which happens when the product improves enough that users stop generating corrective signals, when user behavior shifts faster than training can track, or when annotation capacity becomes the bottleneck.

Deployment density that creates compounding exclusion. Tesla's driving data advantage is not just volume. It is 300 million miles of human intervention in edge cases — the long tail of driver decisions in unusual situations that synthetic data cannot replicate and that no competitor was present to collect. The exclusion is structural. Your flywheel needs an analog: each new deployment must generate signal that no competitor can approximate without being there.

If all three conditions are met, the flywheel is real. If any one is missing, the advantage is temporary.

What This Means for Product Decisions

The practical implication is that investing in a data moat without first auditing these conditions is speculative. Before treating proprietary data as a competitive advantage, the engineering team should be able to answer:

  • Can we describe the mechanism by which this data improves the model, and at what improvement rate?
  • Is our feedback loop closed? Do we know how long the retraining cycle takes?
  • What would it cost a customer to switch to a competitor today? Is that cost operational or merely habitual?
  • Is our advantage in the model, or in the ten years of engineering required to reach this reliability level in this domain?

Companies that cannot answer these questions clearly are likely renting capability with good UX. That can be a business — but it should not be mistaken for compounding defensibility. The teams that build genuine moats are typically the ones who designed for them from day one: choosing deployment strategies that generate proprietary signal, building workflow integrations deep enough that replacement requires process redesign, and treating the engineering work of domain reliability as the product, not the prompt.

The moat is rarely the data. It is almost always the work required to turn that data into something a competitor would take years to replicate.

References:Let's stay in touch and Follow me for more thoughts and updates