The Dependency Injection Pattern for AI Applications: Writing Code That Survives Model Swaps
When OpenAI retired text-davinci-003 in January 2024, teams that had woven that model name into their business logic spent weeks untangling it. Not because swapping a model is technically hard — it's a string and an API call — but because the model was entangled with everything else: prompt construction, response parsing, error handling, retry logic, all intertwined with the assumption that one specific provider would answer. The engineering cost of that kind of migration has been estimated at $50K–$100K for mid-size production systems, plus a month or more of diverted engineering attention.
The fix isn't exotic. It's a pattern every backend engineer already knows: dependency injection. The insight is that your business logic should depend on an abstraction of a language model, not a concrete client from OpenAI or Anthropic. Inject the concrete implementation at startup. The rest of the code never knows which provider is behind the interface.
This sounds obvious in the abstract. In practice, most teams skip it during the prototype phase, never revisit it, and then face the migration tax when a model is deprecated, a provider raises prices threefold, or a competitor's model beats theirs on a critical benchmark.
Why AI Applications Couple Themselves to Providers
The coupling happens naturally. You start with the OpenAI Python SDK because the docs are excellent and the playground is right there. You build a question-answering feature, a summarizer, a code reviewer. Each feature calls client.chat.completions.create(model="gpt-4o", ...) directly. The model name is a string literal in six files. The response parsing assumes the OpenAI response schema. The retry logic wraps the OpenAI rate-limit error class.
Three months later, you want to evaluate Claude because it performs better on your task. Now you're not just swapping a provider — you're refactoring six files, rewriting response parsers (Anthropic's response structure is different), replacing error handling, and updating your prompt format because Anthropic handles system prompts differently than OpenAI. What should be a two-hour experiment becomes a two-week project.
Provider API fragmentation makes this worse than it sounds. Google Gemini rejects schemas with empty items: {} fields that OpenAI accepts. Anthropic requires explicit cache control markers while others handle caching automatically. Temperature constraints differ: OpenAI's newer reasoning models don't support it, Anthropic requires exactly 1 for extended thinking mode. Every provider has quirks that will break code written against another provider's assumptions.
The Interface That Sets You Free
The dependency inversion principle says high-level modules should depend on abstractions, not concrete implementations. For AI applications, the abstraction is a language model interface — a contract that says "given input, produce output" without specifying which provider fulfills it.
In Python, this looks like:
from abc import ABC, abstractmethod
class LanguageModel(ABC):
@abstractmethod
def invoke(self, prompt: str) -> str: ...
@abstractmethod
def invoke_messages(self, messages: list[dict]) -> str: ...
Your business logic takes this interface as a constructor argument:
class DocumentSummarizer:
def __init__(self, model: LanguageModel):
self._model = model
def summarize(self, text: str) -> str:
return self._model.invoke(f"Summarize this document:\n\n{text}")
At application startup, you inject the concrete implementation:
# Production
summarizer = DocumentSummarizer(model=OpenAIModel("gpt-4o"))
# Evaluation run against Claude
summarizer = DocumentSummarizer(model=AnthropicModel("claude-opus-4-5"))
# Unit test
summarizer = DocumentSummarizer(model=MockModel(fixed_response="summary text"))
The DocumentSummarizer class never changes across these three scenarios. You can run an A/B evaluation between models by changing a single line. You can write deterministic unit tests without network calls. When a model is deprecated, you update the startup configuration, not the business logic.
What the Interface Needs to Cover
A minimal language model interface needs to handle more than just text generation. Production interfaces typically include:
- Synchronous and streaming generation: Some use cases need the full response; others need to stream tokens to the UI.
- Structured output: Many providers now support JSON schema enforcement; your interface should abstract this so callers declare the output shape, not the provider-specific enforcement mechanism.
- Token counting: Cost controls and context management require knowing how many tokens a prompt consumes before sending it.
- Error normalization: Rate limits, context-length errors, and safety refusals manifest differently across providers. Your interface should translate these into a common exception hierarchy so callers don't handle provider-specific errors.
The retriever — the component that fetches relevant documents for RAG — needs a parallel interface. The tool server — whatever executes function calls — needs one too. Each of these has provider-specific implementations that change independently of your core logic.
Configuration-Driven Provider Selection
Once you have the interface, the next step is making provider selection configuration-driven rather than code-driven. The pattern is a provider registry: a map from string keys to factory functions that produce concrete implementations.
PROVIDER_REGISTRY = {
"openai:gpt-4o": lambda: OpenAIModel("gpt-4o"),
"anthropic:claude-opus-4-5": lambda: AnthropicModel("claude-opus-4-5"),
"gemini:flash-2.0": lambda: GeminiModel("gemini-flash-2.0"),
}
def build_model(config_key: str) -> LanguageModel:
factory = PROVIDER_REGISTRY.get(config_key)
if not factory:
raise ValueError(f"Unknown model key: {config_key}")
return factory()
Your application reads the model key from an environment variable or config file:
model = build_model(os.environ["LLM_MODEL_KEY"])
This pattern lets you switch providers with a configuration change, no code change. It also enables more sophisticated behaviors: routing different tasks to different models, implementing fallback chains (try Claude, fall back to GPT-4o if rate-limited), or applying budget controls (use a cheaper model once monthly spend exceeds a threshold).
LiteLLM has industrialized this pattern into a proxy server that supports over 100 providers through a unified OpenAI-compatible API. It handles load balancing across providers, fallback routing when a provider is unavailable, Redis-backed rate-limit tracking for multi-instance deployments, and cost accounting. Teams running high-volume AI workloads use it to get provider-agnostic routing with sub-15-microsecond gateway overhead. For smaller teams, it's also available as a Python SDK that wraps the same abstraction without the server infrastructure.
The Testing Dividend
The testing benefits of this pattern are immediate and concrete. Without dependency injection, testing AI application logic requires either calling real APIs (slow, expensive, non-deterministic) or patching the SDK at the module level (fragile, requires intimate knowledge of the library internals).
With a proper model interface, you write a mock implementation once:
class MockModel(LanguageModel):
def __init__(self, responses: dict[str, str] | str):
self._responses = responses
def invoke(self, prompt: str) -> str:
if isinstance(self._responses, str):
return self._responses
return self._responses.get(prompt, "default response")
Now every test gets fast, deterministic, zero-cost execution. You can test that your summarizer sends the right prompt format, that your retry logic fires on errors, that your structured output parser handles edge cases — all without network calls. Test suites that took minutes (waiting on API responses) run in seconds.
The mock also becomes a forcing function for good interface design. If the mock is hard to write, the interface is too wide. The pressure to keep the mock simple pushes you toward a narrow, focused abstraction.
What Frameworks Give You Out of the Box
Several frameworks have standardized this abstraction so you don't have to build it yourself.
LangChain's ChatModel interface wraps 50+ providers behind a common .invoke() method. Switching from ChatOpenAI to ChatAnthropic to ChatGoogleGenerativeAI is a one-line change. The tradeoff is that LangChain's abstraction is opinionated — it makes choices about message formats, callback systems, and chain composition that you either accept or work around.
LlamaIndex takes a more modular approach, splitting the ecosystem into focused packages. You can use its retriever abstractions without adopting its agent framework, and vice versa. This makes it easier to adopt incrementally.
Haystack structures applications as explicit pipelines — directed graphs where each node is a swappable component. This enforces the abstraction at the architectural level: you literally cannot connect a component directly to a provider without going through a defined interface. Teams that want guardrails against coupling will find this structure valuable.
For enterprise Java applications, Spring AI and LangChain4j both apply traditional dependency injection — the same Spring @Bean and @Service patterns Java engineers use for databases and message queues — to AI components. A language model becomes just another injected dependency, configured in application.yml, swapped in tests with standard Spring test utilities.
When the Pattern Is Overkill
For a prototype or internal tool with a six-month lifespan, building a full provider abstraction layer is probably not worth it. If you're certain you'll stay on one provider and the cost of being wrong is low, the abstraction adds cognitive overhead without payoff.
The useful heuristic: if you expect the code to run in production for more than a year, or if you anticipate model evaluation cycles where you'll compare providers on real traffic, build the abstraction from the start. The incremental cost of adding it early is small; the cost of retrofitting it after six months of accumulated coupling is significant.
The other signal is team size. On a solo project, you can hold the coupling in your head and refactor quickly. On a team of ten, every new engineer who touches the code makes assumptions about the model, and those assumptions calcify. The interface is as much a communication tool for the team as it is a technical mechanism.
The Migration Tax You're Accumulating
Model lifespans in the current market are running 12–18 months before deprecation or significant capability shifts that motivate a switch. The OpenAI deprecation cycle, Anthropic's Claude versioning, and Google's Gemini rollout have all demonstrated that the model you build against today will not be the model you run in two years. The question is whether you'll pay the migration cost continuously (via a clean abstraction that makes each upgrade cheap) or in a lump sum (a multi-month refactoring project that freezes feature development).
Teams that have invested in the abstraction report that model upgrades take hours, not weeks — a configuration change, validation against an eval set, promotion to production. Teams that didn't are the ones posting about six-week migrations on engineering blogs.
The pattern isn't novel. It's the same dependency inversion engineers applied to databases in the ORM era, to cloud storage providers, to message queues. AI components are just the latest external dependency that benefits from being treated as external: injected through a defined interface, swappable without touching business logic, mockable in tests.
The 2-3 month migration tax is optional. It's a choice you make at the start of the project.
- https://docs.litellm.ai/docs/routing
- https://www.langchain.com/blog/langchain-langgraph-1dot0
- https://brics-econ.org/interoperability-patterns-to-abstract-large-language-model-providers
- https://yonahdissen.medium.com/why-i-stopped-using-provider-specific-llm-sdks-and-why-you-should-too-3943ac13fe60
- https://futuresearch.ai/blog/llm-provider-quirks/
- https://vertesiahq.com/blog/your-model-has-been-retired-now-what
- https://www.truefoundry.com/blog/vendor-lock-in-prevention
- https://www.pomerium.com/blog/best-llm-gateways-in-2025
- https://medium.com/@vr.rajkumar99/beyond-the-api-a-practical-registry-driven-adapter-for-openai-and-gemini-1298b437f41a
- https://www.zenml.io/blog/what-1200-production-deployments-reveal-about-llmops-in-2025
