Everyone’s focused on using AI to reduce technical debt. Nobody’s talking about the new form of technical debt we’re creating: production dependencies on LLM APIs we don’t control.
This isn’t theoretical. We had a real incident that exposed this risk, and I think the industry needs to wake up to what we’re building.
The Incident: When OpenAI Went Down, So Did We
Two weeks ago, OpenAI API had a 3-hour outage. Our fraud detection system went down with it. We had no fallback. Processed manual reviews for 3 hours, missed fraud patterns, had to extend review windows for flagged transactions.
Cost: approximately $40K in manual review labor + fraud losses + customer trust damage.
The embarrassing part: we knew we had this dependency. We’d talked about building fallbacks. We just hadn’t prioritized it because “the API has been reliable.”
Until it wasn’t.
The New Category of Technical Debt: AI Model Dependencies
Traditional tech debt: old libraries, outdated frameworks, messy code. You control it, you can fix it on your timeline.
AI Model Debt: Production features depending on external LLM APIs. You don’t control:
- When models update and change behavior
- API rate limits and quotas
- Pricing changes
- Provider reliability
- Long-term availability
This is vendor lock-in on steroids.
The Version Lock Problem
We use GPT-4 for fraud pattern detection. OpenAI releases GPT-4.5. We can’t “just upgrade” because:
1. Behavioral Changes: Different model = different outputs for same prompts. Fraud scores change. Must regression test entire system.
2. Prompt Engineering Compatibility: Prompts optimized for GPT-4 don’t work the same on GPT-4.5. Must rewrite and retest.
3. Compliance Requirements: In fintech, we must audit decision-making. If the model changes, we must recertify compliance.
4. Cost Differences: GPT-4.5 pricing is different. Must recalculate economics of features.
This is like if PostgreSQL auto-upgraded and your queries returned different results. Except you can’t pin PostgreSQL version - LLM providers deprecate old models on their timeline.
The Cost Debt: From $200 to $4,000/Month
Started using GPT-4 for “smart features” - fraud detection, customer support assist, document analysis.
Month 1: $200 in API costs. Seemed reasonable.
Month 6: $4,000/month and growing.
The problem: no way to optimize without rewriting features. Can’t cache most queries (fraud detection needs real-time). Can’t switch to cheaper models (accuracy requirements). Can’t self-host (don’t have ML expertise).
Business model problem: we priced our product assuming $200/month AI costs. Now it’s $4,000 and growing with usage. Can’t pass costs to customers without repricing. This is tech debt meets business model debt.
The Invisible Coupling: Business Logic in Natural Language
Traditional code: logic in Python/Java/JS that you can version control, test, audit.
LLM-powered features: logic embedded in prompt engineering. “Natural language” sounds flexible until you realize:
- Prompts aren’t in version control (often stored in env vars or config)
- Can’t unit test prompts the same way as code
- Knowledge transfer is “read the prompt and guess what it does”
- Refactoring means rewriting prompts and regression testing outputs
We have critical fraud detection logic that exists as a 500-word prompt. When the prompt engineer quit, knowledge transfer took 2 weeks. This is tech debt in natural language.
The Compliance Nightmare
Financial services requires auditable decision-making. Regulators ask: “Why did you deny this transaction?”
Pre-AI answer: “Rule #47 in our fraud detection system flagged unusual transaction pattern.”
Post-AI answer: “GPT-4 analyzed transaction context and assigned risk score 0.87.”
Regulator: “Can you explain how GPT-4 made that determination?”
Us: “…it’s a black box.”
This is an unsolved compliance problem. We’re using AI in non-customer-facing workflows until we can solve explainability.
The Controversial Take
AI API dependencies might be worse than legacy code debt.
Legacy code debt: You control it, you can refactor on your timeline, you can add tests, you understand the logic (or can reverse-engineer it).
AI API debt: External vendor controls it, they upgrade on their timeline, black box decision-making, pricing changes out of your control, availability depends on their SLA.
At least with legacy code, the technical debt is yours.
What We’re Doing: Mitigation Strategies
1. Abstraction Layer for All LLM Calls
Built internal API that wraps LLM calls. Provides:
- Fallback logic (if GPT-4 fails, degrade to rule-based)
- Cost monitoring and circuit breakers
- Caching where appropriate
- A/B testing between models
- Logging for audit compliance
2. Gradual Rollout for Model Changes
When OpenAI releases GPT-4.5:
- Test in shadow mode (run both models, compare outputs)
- 1% traffic → 5% → 10% with monitoring
- Regression testing for compliance
- Rollback plan ready
3. Multi-Model Strategy
Don’t depend on single provider:
- Primary: OpenAI GPT-4
- Fallback: Anthropic Claude
- Cost optimization: Mix of models based on task complexity
4. Economic Safeguards
- Budget alerts at $X/month
- Rate limiting on expensive features
- Reserved capacity contracts with providers
- Feature flags to disable AI if costs spike
The Questions I’m Wrestling With
1. At what point is AI API dependency too risky?
We use AI for fraud detection assist (human reviews final decisions). But what if we used AI for automated fraud blocking? Outage = can’t process payments. Is that acceptable risk?
2. Should AI dependencies be in architecture review?
Currently, engineers can add LLM API calls without architecture approval. Should this be treated like adding a new database or critical external service?
3. How do we handle the economics long-term?
If OpenAI 10x’s pricing, do we rebuild features or pass costs to customers? We don’t have good answers.
Questions for Community
-
How are you managing AI API dependencies as production infrastructure? Fallback strategies? Cost controls?
-
Anyone handling explainability requirements? Especially in regulated industries?
-
Has anyone successfully migrated from one LLM provider to another? What broke? How long did it take?
-
Economics: How are you handling the cost unpredictability of LLM APIs in your business model?
This feels like a “technical debt crisis” waiting to happen. We’re building critical features on infrastructure we don’t control, with costs we can’t predict, and decision-making we can’t audit.
I’m not saying don’t use AI APIs. I’m saying: treat them like the critical dependencies they are, with appropriate risk management, fallback plans, and economic safeguards.
Or we’re all going to learn these lessons the hard way when the first major LLM provider has a 24-hour outage or 10x’s pricing overnight.