Foundation Model Vendor Strategy: What Enterprise SLAs Actually Guarantee
Enterprise teams pick LLM vendors based on benchmarks and demos. Then they hit production and discover what the SLA actually says — which is usually much less than they assumed. The 99.9% uptime guarantee you negotiated doesn't cover latency. The data processing agreement your legal team signed doesn't prohibit training on your inputs unless you explicitly added that clause. And the vendor concentration risk that nobody quantified becomes painfully obvious when your core product is down for four hours because a telemetry deployment cascaded through a Kubernetes control plane.
This is not a procurement problem. It's an engineering problem that procurement can't solve alone. The people who build AI systems need to understand what these contracts actually say — and what they don't.
What SLA Tiers Actually Guarantee (and Don't)
The gap between what enterprise teams expect from SLAs and what those SLAs deliver is substantial.
Uptime guarantees are narrow. OpenAI's Scale and Priority tiers both advertise 99.9% monthly uptime. Azure OpenAI Service does too. AWS Bedrock typically matches that figure. But read the fine print: "uptime" in most of these agreements means the API endpoint is reachable and returning responses — not that it's returning good responses at acceptable latency. A model that's responding with timeouts at the 90th percentile or hallucinating at ten times the normal rate still counts as "up" under most definitions.
Latency SLAs are expensive and rare. OpenAI's Priority tier advertises a per-minute p50 latency guarantee, but that's a median — half your requests can still be slower. Formal latency commitments at the p95 or p99 level require custom enterprise negotiations and, typically, provisioned capacity. Azure OpenAI's Provisioned Throughput Units (PTUs) offer the clearest path to consistent latency: you reserve a fixed throughput allocation and get predictable response times in exchange for paying whether or not you use the capacity.
Service credits don't cover your downtime costs. When AWS, Azure, or OpenAI breach their SLA, you typically get credit toward future usage — often capped at a percentage of monthly fees. If your product was unavailable for four hours and you lost customer trust or revenue, those credits don't compensate you. This isn't hidden; it's standard cloud SLA structure. But teams that have never experienced a major outage often don't internalize it until they're filing a credit request after their worst week.
Support tiers have variable definitions. Enterprise support tiers advertise response times, but "response" often means acknowledgment — not resolution or even triage. For AI services where the root cause is a model behavior change rather than an infrastructure failure, even an engaged support team may not have answers within your SLA window.
The practical implication: treat SLA tiers as a starting point for negotiation, not as a description of what you'll actually experience in production. The vendors that take enterprise seriously will negotiate on latency commitments, define escalation paths for model behavior regressions, and provide dedicated technical contacts who understand your use case.
Enterprise Pricing: What's Negotiable vs. Fixed
Token-based pricing is universal across major providers, but the structure underneath it varies considerably.
All major vendors charge input tokens at a fraction of output token cost — typically 1/5 to 1/10 the rate — because inference compute is heavily dominated by token generation. This asymmetry matters for workload planning: a pipeline that generates long structured outputs costs very differently from one that processes large documents and returns short answers.
Volume discounts exist but require asking. OpenAI offers 12–18% discounts for enterprise customers exceeding 5 million monthly tokens. AWS Bedrock can deliver up to 50% savings through Provisioned Throughput Unit reservations. Azure offers similar PTU economics. The catch is that PTU-style reservations require committing to a capacity level and paying for it regardless of actual usage — the classic cloud reserved-instance trade-off applied to inference.
What's actually negotiable in enterprise contracts:
- Custom latency SLAs and uptime guarantees
- Training data restrictions (explicit prohibition on using your data for model training)
- Data residency options and deletion timelines
- Support response time commitments
- Contract refresh triggers tied to competitor pricing
- Rate limit increases and burst capacity
What's typically fixed:
- Base token pricing below certain volume thresholds
- Standard API feature availability
- Model release schedules
- The fundamental training data opt-out defaults (though you can override via contract)
One important recent development: Anthropic separated seat licensing from token usage in early 2026, removing bundled token allowances from enterprise plans. Teams now pay a per-seat fee for Claude access and bill API tokens separately at standard rates. This change made cost modeling more predictable but also removed the buffer that some teams relied on for spiky workloads.
When entering negotiations, model-agnostic system design is your strongest lever. A vendor who knows you can switch to a competitor's API without rewriting your application treats pricing conversations differently than one who knows you're deeply embedded.
Data Processing Agreements: The Clause You're Missing
Data processing agreements (DPAs) are legally mandatory under GDPR when a vendor processes personal data on your behalf. Every major LLM vendor provides one. Most of them leave critical questions underspecified by default.
Training data usage is now default opt-in for consumers. In August 2025, major providers — including Anthropic, Google, and OpenAI — shifted their defaults so that consumer-tier users' data is used for model training unless they opt out. The critical carve-out: API customers and enterprise agreements remain protected. But if your organization uses any consumer-facing product alongside the API, verify which data flows where.
The clauses most teams forget to add:
- Explicit prohibition on using your inputs as training data (not just "we don't train on your data by default" but "you shall not train on our data under any circumstances")
- Specific data retention limits and deletion timelines (including backups and audit logs)
- Breach notification timelines (GDPR requires 72 hours; your contract should match)
- Data residency requirements and which regions data can transit
- What happens to fine-tuned models if you terminate the contract
- https://help.openai.com/en/articles/5008641-is-there-an-sla-for-latency-guarantees-on-the-various-engines
- https://redresscompliance.com/azure-openai-sla-and-support-whats-covered-and-whats-not.html
- https://openai.com/api-scale-tier/
- https://platform.claude.com/docs/en/about-claude/pricing
- https://cloud.google.com/terms/gemini-enterprise/sla
- https://aws.amazon.com/bedrock/sla/
- https://contractnerds.com/understanding-training-data-in-contracts-with-ai-vendors/
- https://www.protecto.ai/blog/llm-privacy-compliance-steps/
- https://customgpt.ai/how-to-avoid-llm-vendor-lock-in/
- https://introl.com/blog/vendor-management-ai-infrastructure-negotiating-gpu-contracts-slas
- https://www.techtarget.com/searchcio/feature/What-CIOs-need-to-know-going-into-ai-vendor-negotiations
- https://redresscompliance.com/openai-enterprise-procurement-negotiation-playbook/
- https://ai2.work/economics/ai-market-openai-risk-2025/
- https://www.theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/
- https://www.datacenterknowledge.com/cloud/2025-cloud-highlights-ai-outages-and-the-future-of-infrastructure
