The Technical Debt We're Not Talking About: AI Model Dependencies You Don't Control

Everyone’s focused on using AI to reduce technical debt. Nobody’s talking about the new form of technical debt we’re creating: production dependencies on LLM APIs we don’t control.

This isn’t theoretical. We had a real incident that exposed this risk, and I think the industry needs to wake up to what we’re building.

The Incident: When OpenAI Went Down, So Did We

Two weeks ago, OpenAI API had a 3-hour outage. Our fraud detection system went down with it. We had no fallback. Processed manual reviews for 3 hours, missed fraud patterns, had to extend review windows for flagged transactions.

Cost: approximately $40K in manual review labor + fraud losses + customer trust damage.

The embarrassing part: we knew we had this dependency. We’d talked about building fallbacks. We just hadn’t prioritized it because “the API has been reliable.”

Until it wasn’t.

The New Category of Technical Debt: AI Model Dependencies

Traditional tech debt: old libraries, outdated frameworks, messy code. You control it, you can fix it on your timeline.

AI Model Debt: Production features depending on external LLM APIs. You don’t control:

  • When models update and change behavior
  • API rate limits and quotas
  • Pricing changes
  • Provider reliability
  • Long-term availability

This is vendor lock-in on steroids.

The Version Lock Problem

We use GPT-4 for fraud pattern detection. OpenAI releases GPT-4.5. We can’t “just upgrade” because:

1. Behavioral Changes: Different model = different outputs for same prompts. Fraud scores change. Must regression test entire system.

2. Prompt Engineering Compatibility: Prompts optimized for GPT-4 don’t work the same on GPT-4.5. Must rewrite and retest.

3. Compliance Requirements: In fintech, we must audit decision-making. If the model changes, we must recertify compliance.

4. Cost Differences: GPT-4.5 pricing is different. Must recalculate economics of features.

This is like if PostgreSQL auto-upgraded and your queries returned different results. Except you can’t pin PostgreSQL version - LLM providers deprecate old models on their timeline.

The Cost Debt: From $200 to $4,000/Month

Started using GPT-4 for “smart features” - fraud detection, customer support assist, document analysis.

Month 1: $200 in API costs. Seemed reasonable.
Month 6: $4,000/month and growing.

The problem: no way to optimize without rewriting features. Can’t cache most queries (fraud detection needs real-time). Can’t switch to cheaper models (accuracy requirements). Can’t self-host (don’t have ML expertise).

Business model problem: we priced our product assuming $200/month AI costs. Now it’s $4,000 and growing with usage. Can’t pass costs to customers without repricing. This is tech debt meets business model debt.

The Invisible Coupling: Business Logic in Natural Language

Traditional code: logic in Python/Java/JS that you can version control, test, audit.

LLM-powered features: logic embedded in prompt engineering. “Natural language” sounds flexible until you realize:

  • Prompts aren’t in version control (often stored in env vars or config)
  • Can’t unit test prompts the same way as code
  • Knowledge transfer is “read the prompt and guess what it does”
  • Refactoring means rewriting prompts and regression testing outputs

We have critical fraud detection logic that exists as a 500-word prompt. When the prompt engineer quit, knowledge transfer took 2 weeks. This is tech debt in natural language.

The Compliance Nightmare

Financial services requires auditable decision-making. Regulators ask: “Why did you deny this transaction?”

Pre-AI answer: “Rule #47 in our fraud detection system flagged unusual transaction pattern.”

Post-AI answer: “GPT-4 analyzed transaction context and assigned risk score 0.87.”

Regulator: “Can you explain how GPT-4 made that determination?”

Us: “…it’s a black box.”

This is an unsolved compliance problem. We’re using AI in non-customer-facing workflows until we can solve explainability.

The Controversial Take

AI API dependencies might be worse than legacy code debt.

Legacy code debt: You control it, you can refactor on your timeline, you can add tests, you understand the logic (or can reverse-engineer it).

AI API debt: External vendor controls it, they upgrade on their timeline, black box decision-making, pricing changes out of your control, availability depends on their SLA.

At least with legacy code, the technical debt is yours.

What We’re Doing: Mitigation Strategies

1. Abstraction Layer for All LLM Calls

Built internal API that wraps LLM calls. Provides:

  • Fallback logic (if GPT-4 fails, degrade to rule-based)
  • Cost monitoring and circuit breakers
  • Caching where appropriate
  • A/B testing between models
  • Logging for audit compliance

2. Gradual Rollout for Model Changes

When OpenAI releases GPT-4.5:

  • Test in shadow mode (run both models, compare outputs)
  • 1% traffic → 5% → 10% with monitoring
  • Regression testing for compliance
  • Rollback plan ready

3. Multi-Model Strategy

Don’t depend on single provider:

  • Primary: OpenAI GPT-4
  • Fallback: Anthropic Claude
  • Cost optimization: Mix of models based on task complexity

4. Economic Safeguards

  • Budget alerts at $X/month
  • Rate limiting on expensive features
  • Reserved capacity contracts with providers
  • Feature flags to disable AI if costs spike

The Questions I’m Wrestling With

1. At what point is AI API dependency too risky?

We use AI for fraud detection assist (human reviews final decisions). But what if we used AI for automated fraud blocking? Outage = can’t process payments. Is that acceptable risk?

2. Should AI dependencies be in architecture review?

Currently, engineers can add LLM API calls without architecture approval. Should this be treated like adding a new database or critical external service?

3. How do we handle the economics long-term?

If OpenAI 10x’s pricing, do we rebuild features or pass costs to customers? We don’t have good answers.

Questions for Community

  1. How are you managing AI API dependencies as production infrastructure? Fallback strategies? Cost controls?

  2. Anyone handling explainability requirements? Especially in regulated industries?

  3. Has anyone successfully migrated from one LLM provider to another? What broke? How long did it take?

  4. Economics: How are you handling the cost unpredictability of LLM APIs in your business model?

This feels like a “technical debt crisis” waiting to happen. We’re building critical features on infrastructure we don’t control, with costs we can’t predict, and decision-making we can’t audit.

I’m not saying don’t use AI APIs. I’m saying: treat them like the critical dependencies they are, with appropriate risk management, fallback plans, and economic safeguards.

Or we’re all going to learn these lessons the hard way when the first major LLM provider has a 24-hour outage or 10x’s pricing overnight.

This is the conversation financial services desperately needs. We cannot put unauditable AI in transaction processing or compliance workflows. When regulators ask why we flagged a transaction, we need deterministic, explainable logic. LLM APIs fail all requirements: non-deterministic, black box, can’t reproduce historical decisions. We use AI for developer tools and internal efficiency, never for transaction decisions or compliance checks. One audit failure could cost us our banking license.

Built abstraction layer for exactly this reason. Application calls internal LLM service which wraps GPT-4, Claude, local models with monitoring, fallbacks, cost controls. When GPT-4 had outage, system auto-failed to Claude. Set budget ceiling - when hit, switches to cheaper models, increases caching, degrades features gracefully. Took 2 weeks upfront, saved us from 3 production incidents. Treat LLM APIs like any external dependency: wrap, monitor, circuit breakers, graceful degradation.

The economic dependency keeps me up at night. Started with AI features as value add at no extra charge. Users adopted heavily. Now K monthly in LLM costs, growing 20%. Can’t remove features without customer revolt, can’t easily pass costs to customers. If OpenAI 10x prices, we have three bad options: absorb cost (kills margins), pass to customers (requires repricing), or rebuild features (6+ months). We’re strategically dependent on external roadmaps we don’t control.

This is vendor lock-in 2.0 - at the UX level. Traditional lock-in affects backend, invisible to customers. AI model lock-in affects user experience. We built features around GPT-4 specific capabilities, response times, personality. Switching models means redesigning UX. Prompt engineering is product knowledge in engineers’ heads, not docs. If OpenAI controls our UX and can change pricing anytime, do we actually own our product? That’s a strategic vulnerability board should understand.