LLM Vendor Lock-In Is a Spectrum, Not a Binary

April 19, 2026 · 10 min read

Software Engineer

A team builds a production feature on GPT-4. Months later, they decide to evaluate Claude for cost reasons. They spend two weeks "migrating"—but the core API swap takes an afternoon. The remaining ten days go toward fixing broken system prompts, re-testing refusal edge cases, debugging JSON parsers that choke on unexpected prose, and re-tuning tool-calling schemas that behave differently across providers. Migration estimates that assumed a simple connector swap balloon into a multi-layer rebuild.

This is the LLM vendor lock-in problem in practice. And the teams that get burned aren't the ones who chose the wrong provider—they're the ones who didn't recognize that lock-in exists on multiple axes, each with a different risk profile.

Lock-In Has Six Layers, and They Don't All Matter Equally

Most engineers think of vendor lock-in as an API format problem. It's much more than that.

Layer 1: API call format. The OpenAI-compatible API has become a de facto standard—80%+ of new providers implement it. Changing the base URL and model name is a one-line change. This layer has the lowest switching cost and is largely a solved problem. If you build against the OpenAI SDK shape and your provider supports it, you're mostly fine.

Layer 2: Prompt vocabulary and structure. This is where engineers first get surprised. Claude has been fine-tuned to pay close attention to XML tag structures. GPT models prefer markdown-formatted prompts with sections and emphasis. A system prompt optimized for one model family often produces degraded output on another—not catastrophically wrong, but subtly off in ways that take time to diagnose. The instruction style also diverges: instruction-oriented models benefit from explicit, structured prompts; reasoning-oriented models perform better with sparse, high-level goals and strong verification steps.

Layer 3: Tool-calling schemas. OpenAI uses JSON Schema for function definitions. Claude uses content blocks where tool_use appears separately from text. Google Gemini has its own format. Tool selection accuracy varies: GPT-4o achieves 97–99%, Claude Sonnet runs 96–99%, Gemini sits at 95–98%. These differences are tolerable in isolation, but when you're running thousands of agentic tool calls per day, a 2% divergence in accuracy is a real production difference.

Layer 4: Output length norms and formatting quirks. Claude occasionally prepends prose before a JSON block—"Here's the data you requested:"—which silently breaks parsers that expect raw JSON. GPT-4o is more consistent about following output format instructions exactly. These behaviors are undocumented and change between model versions. Teams discover them in production, not in evaluation.

Layer 5: Refusal patterns and safety alignment. Different models have different thresholds. Tightly safety-aligned models generate very low fulfillment rates on unsafe prompts—but also produce higher over-refusal rates on innocuous ones. If your application touches borderline domains (medical, legal, adult content), the refusal profile of your model is part of your product. Switching providers can change that profile in ways that are invisible until they surface as customer complaints.

Layer 6: Embeddings, fine-tuning, and stored state. This is the hardest layer to escape. Chat histories stored as embeddings with one provider's model are incompatible with another's embedding space. Models fine-tuned on proprietary platforms often can't be exported. If your application uses retrieval or memory, your data is entangled with your provider's representation layer.

Which Dependencies Are Acceptable—and Which Are Debt From Day One

The right mental model isn't "avoid all lock-in." It's "know which lock-in you're taking on deliberately."

Acceptable lock-in is any dependency where:

The feature provides genuine, measurable advantage not available elsewhere
Migration cost is bounded and you've estimated it honestly
The feature is stable and the provider has committed to its longevity
You've built an abstraction layer that limits the blast radius downstream

Using Claude's extended thinking for complex reasoning pipelines is acceptable lock-in. The capability is genuinely differentiated, it's available across multiple deployment targets (direct API, Bedrock, Vertex AI), and you can scope its use to specific pipeline stages without it bleeding into the rest of your codebase. Similarly, using GPT-4o's vision capabilities for document understanding tasks where it demonstrably outperforms alternatives is a reasonable engineering tradeoff.

Technical debt from day one looks like:

Application code that calls provider APIs directly without an abstraction layer, with no plan to add one
Prompt strings littered with provider-specific formatting that's never been tested against alternatives
Fine-tuning on a proprietary platform without exporting the weights or validating that the improvement generalizes to parameter-efficient techniques (LoRA, adapters) that could be applied to other base models
Storing embeddings and chat histories in provider-managed vector stores without an export path
Relying on output behaviors (specific JSON formatting, verbosity levels, reasoning traces) that are neither documented nor tested for consistency across versions

The cost of the debt category isn't just the eventual migration. It's the ongoing tax: you can't do competitive evaluation of new models without rebuilding your eval harness, you can't respond to provider pricing changes, and you can't fall back to an alternative when a provider has an outage.

The Abstraction Architecture That Actually Works

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

LLM Vendor Lock-In Is a Spectrum, Not a Binary

Lock-In Has Six Layers, and They Don't All Matter Equally

Which Dependencies Are Acceptable—and Which Are Debt From Day One

The Abstraction Architecture That Actually Works

Recommended Reading

About Tian Pan

Lock-In Has Six Layers, and They Don't All Matter Equally​

Which Dependencies Are Acceptable—and Which Are Debt From Day One​

The Abstraction Architecture That Actually Works​

Recommended Reading

About Tian Pan

Lock-In Has Six Layers, and They Don't All Matter Equally

Which Dependencies Are Acceptable—and Which Are Debt From Day One

The Abstraction Architecture That Actually Works