I’m building our AI inference API on serverless architecture, and I need to vent about something: it’s 2026, and we’re still fighting the same cold start problems that plagued Lambda in 2018. How is this acceptable?
The Promise vs The Reality
The promise: Serverless lets you focus on code, not infrastructure. Pay only for what you use. Scale automatically. Simple.
The reality: I’m spending more time debugging serverless-specific issues than I ever did with containers. And our P99 latency is unacceptable.
The Cold Start Problem (Still)
We’re running ML model inference on AWS Lambda. Here’s what our latency looks like:
- P50: 200ms (acceptable)
- P90: 1.2 seconds (annoying)
- P99: 3-5 seconds (completely unacceptable for user-facing API)
That P99 is cold starts. When a new Lambda container spins up, it has to:
- Download our model weights (500MB)
- Initialize the ML runtime
- Warm up GPU (when available)
- Process the first request
For a user hitting our API, that’s a 5-second wait. They assume the service is broken and retry, making it worse.
The “Solutions” That Don’t Actually Work
Provisioned concurrency: Sure, I can keep functions warm. Now I’m paying for idle capacity. This defeats the entire “pay for what you use” value proposition. We’re essentially running containers-as-a-service at serverless prices (read: more expensive).
Smaller deployment packages: Our model is 500MB. I can’t make it smaller without sacrificing accuracy. This isn’t a JavaScript bundle I can tree-shake.
Caching layers: We’ve implemented Redis caching, pre-warmed instances, keep-alive pings. Now our “simple serverless architecture” has 6 different components just to work around cold starts.
The Vendor Lock-In Reality
Here’s what nobody tells you: the moment you try to solve serverless problems, you’re deep in vendor-specific APIs.
Our codebase now has AWS Lambda-specific code everywhere:
- Lambda layers for dependencies
- Lambda environment variables and secrets
- Lambda-specific logging and monitoring
- CloudWatch metrics and alarms
- API Gateway integration patterns
Moving this to Azure Functions or Google Cloud Functions would require rewriting significant portions. So much for “cloud-agnostic serverless.”
The Cost Surprise
For workloads with predictable, constant load, serverless is more expensive than containers.
Our billing API gets hit constantly during business hours. Calculation:
Serverless (Lambda + API Gateway): ~,200/month
- Invocation costs
- Duration charges (billed by 100ms)
- API Gateway requests
- Data transfer
Container alternative (ECS Fargate): ~/month
- 2 containers, always running
- Same performance
- Predictable costs
We’re paying 11x more for serverless on this workload. The only reason we haven’t migrated is technical debt - we built so many Lambda-specific integrations that migration cost is high.
The Skills Gap Problem
Hiring engineers who deeply understand serverless is harder than I expected. Most developers know how to build web services. Far fewer understand:
- Event-driven architecture patterns
- Lambda execution context lifecycle
- How to debug distributed serverless functions
- Serverless-specific performance optimization
- Cost optimization techniques
We hired a senior engineer who was great at building APIs. They struggled for months with serverless because the mental model is completely different. “Why can’t I just log to a file?” “Why does my database connection keep timing out?” “Why is my function slow sometimes and fast other times?”
When Does Serverless Actually Make Sense?
I’m not saying serverless is useless. Here’s where it genuinely works for us:
Webhook processing: Sporadic GitHub webhooks, Stripe payment callbacks. True event-driven workloads with unpredictable timing. Serverless is perfect here.
Scheduled batch jobs: Nightly data exports, weekly report generation. Run once, shut down. Great use case.
Image processing pipelines: User uploads image, Lambda processes it, stores result. Natural fit for event-driven architecture.
But our API that gets constant traffic during business hours? We chose the wrong tool.
The Question
It’s 2026. Serverless has been mainstream for nearly a decade. Why are we still dealing with:
- Cold start latency that makes APIs unusable
- Vendor lock-in through proprietary APIs
- Debugging nightmares in distributed systems
- Cost models that penalize predictable workloads
- Skills gaps that slow down teams
Am I missing something? Are there serverless platforms that solved these problems? Or is serverless just the wrong tool for API workloads, and I need to accept that?