💰 The Real Cost of AI: Infrastructure Economics Beyond the Hype

cto_michelle · October 9, 2025, 12:24am

Just came from the “AI Infrastructure at Scale” panel at SF Tech Week and the numbers are sobering. Let’s talk real costs.

Panelists: CTOs from Anthropic, Stability AI, Midjourney, plus Crusoe Energy (GPU cloud)

The GPU Crisis is Real

Current H100 Market (October 2025):

H100 spot price: $2.89/hour (down from $4.50 in June)
H100 reserved (1-year): $1.95/hour
A100 spot price: $1.10/hour
Cloud markup: 40-60% vs bare metal

Why prices dropped: Major cloud providers finally got supply, NVIDIA shipped 500K+ H100s in Q3

But here’s the catch: H200s launching in Q1 2026, and everyone will want to upgrade

Training Cost Reality Check

Anthropic shared actual numbers for Claude models:

Claude 2: $15M-$20M in compute (2023)
Claude 3: $35M-$45M in compute (2024)
Next generation: Estimated $80M-$120M

Stability AI’s Stable Diffusion 3:

Training cost: $6M
16,384 A100s for 3 weeks
Plus data processing, storage, failed experiments

OpenAI GPT-4 (reported):

Estimated training cost: $100M-$150M
25,000 A100s for 3-4 months

These are TRAINING costs only - not infrastructure, salaries, data, R&D.

Inference Cost Structure

Midjourney CEO shared their economics:

Revenue: $200M ARR
Inference costs: $50M-$60M annually (25-30% of revenue)
GPU fleet: 30,000+ A100s

Cost per image generation:

High quality: $0.032
Standard: $0.018
Price to user: $0.04 (paid tier)

Gross margin on compute: 20-25%

This is AFTER massive optimization. Year 1 they lost money on every image.

The Elephant in the Room: Who Can Actually Afford This?

Crusoe Energy’s brutal breakdown:

Minimum viable AI startup infrastructure:

64 H100s (small cluster): $12K/month reserved, $18K spot
Storage for training data: $5K/month
Networking: $3K/month
Total: $20K-$26K/month = $240K-$312K/year

And that’s for a SMALL training run. Most serious models need 256+ GPUs.

Series A startup ($4M raise, 18-month runway):

AI infrastructure: $300K-$500K/year
That’s 12.5% of your runway on GPUs alone
Before salaries, before data, before anything else

Cloud vs On-Premise Decision

Panel consensus:

Under $500K annual GPU spend: Cloud (AWS, GCP, Azure)

Flexibility, no upfront capex
But 50% markup over bare metal

$500K-$2M annual: Specialized GPU clouds (Crusoe, Lambda, CoreWeave)

30% cheaper than hyperscalers
Good support, fast provisioning

Over $2M annual: Consider on-premise or colo

Anthropic example: Own data centers in partnership with Equinix
Upfront cost: $3M-$5M for 1,024 H100 cluster
Break-even: 18-24 months vs cloud
Only makes sense if you’ll use it for 3+ years

The Hidden Costs Nobody Talks About

Storage:

Training datasets: 100TB-1PB
Model checkpoints: 10-50TB per training run
Logs and telemetry: 5-10TB/month
Cost: $0.023/GB/month (S3) = $23K-$230K/month for data

Networking:

Inter-GPU bandwidth crucial for distributed training
InfiniBand clusters: $200K-$500K infrastructure
Cross-region data transfer: $0.09/GB (adds up FAST)

Talent:

ML infrastructure engineer: $250K-$400K
You need 2-3 for 24/7 coverage
Plus ML engineers, data engineers

Failed Experiments:

Stability AI: “For every successful model, we have 10-15 failed training runs”
Failed runs still cost money - budget 2-3x your successful training cost

FinOps for AI: Cost Control Strategies

Strategies shared by panelists:

1. Spot instance arbitrage

Midjourney: 60% inference on spot instances
Automatic fallback to reserved
Saves 40% on compute

2. Multi-cloud strategy

Play AWS vs GCP vs Azure pricing
“We moved 30% workload to GCP when they offered 25% discount” - Stability AI

3. Model optimization

Quantization: 8-bit models use 4x less memory
Distillation: Smaller models for inference
Midjourney cut inference costs 60% through optimization

4. Smart batching

Batch inference requests
Higher GPU utilization
2-3x better cost efficiency

5. Geographic arbitrage

Oregon (cheap hydro power): $1.80/hour H100
Northern Virginia (demand): $2.50/hour H100
40% cost difference for same hardware

ROI Reality Check

Question I asked: “When do AI investments pay back?”

Anthropic CTO: “If you’re selling AI products, you need 60%+ gross margins to survive. Infrastructure costs eat 30-40% of revenue in year 1-2. Optimize down to 20-25% by year 3-4.”

Midjourney CEO: “We didn’t hit positive unit economics until month 18. Had to raise $50M to get there. If you don’t have 2+ years of runway, don’t start a high-compute AI company.”

Stability AI VP Eng: “Our burn rate was $5M/month at peak, 80% on infrastructure. We had to dramatically cut experiments and focus only on products with clear revenue path.”

My Takeaway for Our Startup

We’re building AI-powered analytics. Current spend: $8K/month on inference.

Planning for next 12 months:

Current trajectory: $8K → $25K/month as we scale users
That’s $300K annual run rate
Need to get gross margins above 70% to make economics work
Optimization roadmap: Model distillation, caching, batching

The sobering truth: AI infrastructure costs scale with users faster than revenue. You MUST have a plan for unit economics from day one.

Anyone else dealing with runaway GPU costs?

Michelle

Reporting from SF Tech Week - Moscone Center, “AI Infrastructure at Scale” panel

Sources:

Anthropic, Stability AI, Midjourney CTOs (live panel)
Crusoe Energy pricing data
AWS/GCP/Azure public pricing

finance_carlos · October 9, 2025, 12:24am

This is the financial reality check every AI founder needs. Following up from the a16z “AI Economics 101” session.

The Infrastructure Budget Model VCs Want to See

What a16z told founders:

Your financial model needs these line items:

Training compute (capex or opex)
Inference compute (scales with users)
Storage (data + models)
Networking
ML infrastructure engineers
“Failure budget” (20-30% of training costs)

Most founders only budget item 1 and 2. Mistake.

Unit Economics Deep Dive

Example from the session (generative AI SaaS):

Revenue per user: $20/month
Inference cost per user: $4/month (20% gross margin)

Sounds okay until you factor in:

Customer acquisition cost: $60 (3-month payback)
Support: $2/user/month
Infrastructure overhead: $1/user/month
All-in gross margin: 13%

The scary math:

Need $100M revenue to hit $13M gross profit
Meanwhile burning $15M/year on fixed costs
Doesn’t work

Fix: Either raise prices to $35/month or cut inference costs to $1.50/month (4x optimization)

Capital Efficiency: Cloud vs Own Infrastructure

I ran the numbers for our company:

Scenario A: Cloud (AWS)

Year 1: $400K
Year 2: $800K (scale 2x)
Year 3: $1.2M (scale 1.5x)
3-year total: $2.4M
Capex: $0

Scenario B: Own GPUs (colo)

Year 0: $2.5M capex (512 H100s + infrastructure)
Year 1-3: $300K/year (power, colo fees, maintenance)
3-year total: $3.4M
But: Own assets worth ~$1.5M residual

Break-even analysis:

Cloud cheaper if usage <$1M/year
Own breaks even at 2.5 years
Own wins if you’re in it for 3+ years AND can predict usage

The risk: What if your model changes? Stuck with $2.5M deprecating hardware.

a16z recommendation: Cloud until Series B, then evaluate owned infrastructure

Cash Flow Impact

This is huge and founders miss it:

Cloud: Pay-as-you-go, matches revenue curve

Burn $20K/month early
Scale to $100K/month as you grow

Owned: $2.5M upfront

Destroys runway
Need to raise extra $3M just for infrastructure

Debt financing option:

Some banks will finance GPU purchases (saw pitch from Silicon Valley Bank)
4-year term, 8% interest
Turns capex into opex
But: Debt on balance sheet affects future fundraising

The Optimization Imperative

Numbers from Hugging Face CFO session:

Their optimization journey:

Month 1: $0.12 per inference
Month 6: $0.08 (33% reduction via batching)
Month 12: $0.05 (58% reduction via quantization)
Month 18: $0.03 (75% reduction via distillation + caching)

This is the difference between bankrupt and profitable.

Budgeting for Different Stages

Seed ($2M raise):

Max AI infrastructure: $150K/year (7.5% of runway)
Focus on proving concept, not scale
Cloud only, no owned infrastructure

Series A ($8M raise):

Max AI infrastructure: $500K/year
Hire ML infra engineer ($300K)
Still cloud, start optimization efforts

Series B ($25M raise):

Infrastructure budget: $2M-$3M/year
ML infra team of 3-4
Evaluate owned infrastructure
FinOps becomes critical

Gross Margin Benchmarks

VCs shared what they want to see:

Year 1-2 (early):

AI-first product: 40-50% gross margin acceptable
Still optimizing, proving model

Year 3-4 (scaling):

Need 60-70% gross margin
Infrastructure costs must be <30% of revenue
Approaching SaaS economics

Year 5+ (mature):

70-80% gross margin expected
Infrastructure <20% of revenue
Comparable to traditional SaaS

If you can’t hit these, you have a structural problem.

My Action Items

Add “infrastructure optimization” as board-level KPI
Hire ML infra engineer to focus on cost reduction
Set quarterly gross margin targets
Build financial model with unit economics sensitivity analysis

The meta lesson: Treat AI infrastructure costs like COGS, not R&D. They scale with revenue and must be managed obsessively.

Carlos

SF Tech Week - a16z “AI Economics 101” session

data_rachel · October 9, 2025, 12:24am

Coming from the “Efficient ML Systems” workshop - let me add the engineering reality to these financial discussions.

Training Costs: The Details They Don’t Tell You

Workshop leaders: ML infrastructure engineers from Meta, Google Brain, NVIDIA

Real example from Meta’s Llama 3 training:

Hardware: 16,000 H100 GPUs
Duration: 21 days
Power consumption: 10 MW average
Training cost breakdown:

GPU compute: $6.5M (16K GPUs × 21 days × $2.50/hour)
Storage: $400K (training data + checkpoints)
Networking: $200K (InfiniBand infrastructure)
Power: $500K (10 MW × 21 days × $0.12/kWh)
Failed runs and experiments: $2M
Total: $9.6M

And that’s ONE model. Meta trains dozens of variants in parallel.

Why Training Costs Spiral

Reasons from NVIDIA’s session:

1. Hyperparameter search

Need to try 10-20 different configurations
Each one is a full training run
Only 1-2 will be production-worthy

2. Data quality issues

Found data bug at epoch 50? Start over.
Contaminated training data? Start over.
Meta: “We restarted Llama 3 training 4 times due to data issues”

3. Model instability

Training diverges at epoch 80? Start over.
“We lost 2 weeks of training due to NaN gradients” - Google engineer

4. Hardware failures

At 16K GPU scale, something fails daily
Checkpointing and restarts add 10-15% overhead
Need redundancy built in

Inference Optimization: How to Cut Costs 4x

Techniques workshop covered:

1. Quantization (INT8/INT4)

Before: 16-bit float (FP16)
After: 8-bit integer (INT8)
Memory reduction: 2x
Speed improvement: 1.5-2x
Accuracy loss: <1% for most models
Cost savings: 60-70%

2. Model distillation

Train small model to mimic large model
Example: GPT-4 → GPT-3.5-sized model
Performance: 90-95% of original
Cost: 10x cheaper inference
Use case: 80% of queries don’t need full model

3. KV cache optimization

Cache key-value pairs for repeated tokens
Reduces compute for long contexts
Savings: 30-40% for chat applications

4. Speculative decoding

Use small model to predict next tokens
Verify with large model
2-3x faster inference
Same accuracy

Meta’s results combining all techniques:

Llama 3 base: $0.024 per 1K tokens
Optimized: $0.006 per 1K tokens
4x cost reduction with <2% quality loss

Storage Costs: The Hidden Giant

Google Brain engineer shared their numbers:

Training dataset for large LLM:

Raw data: 15TB (web crawl, books, code)
Processed data: 8TB (cleaned, tokenized)
Multiple copies: 3-4x (redundancy, fast access)
Total storage: 35-40TB

But that’s just the start:

Checkpoints during training:

Save model every 1,000 steps
Each checkpoint: 500GB-2TB
50 checkpoints per training run
Storage: 25TB-100TB

Experiment tracking:

Logs, metrics, intermediate outputs
10-20TB per major training run

Total for one model: 70-160TB

Cost at scale:

Standard storage (S3): $0.023/GB/month = $1,600-$3,700/month
High-performance (EBS): $0.10/GB/month = $7,000-$16,000/month

And you keep this for months/years for reproducibility.

Networking: The Bottleneck

NVIDIA deep dive on GPU cluster networking:

Why InfiniBand matters:

H100 GPU can process 1,000 GB/s
Ethernet: 100 GB/s (10x bottleneck)
InfiniBand: 400 GB/s (4x better)

Cost difference:

1,024 GPU cluster with Ethernet: $8M GPUs + $500K networking
Same cluster with InfiniBand: $8M GPUs + $2.5M networking

But training speed:

Ethernet: 100 days
InfiniBand: 28 days

ROI calculation:

InfiniBand upfront cost: +$2M
Saves: 72 days × 1,024 GPUs × $2/hour = $3.5M
Net savings: $1.5M

You NEED InfiniBand for clusters >256 GPUs or you waste money on slow training.

Power and Cooling: Real Infrastructure Costs

From Crusoe Energy session:

H100 GPU: 700W power draw
1,024 GPU cluster: 717 kW just for GPUs
Plus:

CPUs, networking, storage: +30% = 930 kW
Cooling (1.3 PUE): +280 kW
Total: 1,210 kW = 1.2 MW

Monthly power cost:

1.2 MW × 730 hours × $0.12/kWh = $105,000/month
Annual: $1.26M

This is why Crusoe builds data centers next to stranded natural gas - power at $0.03/kWh vs $0.12/kWh.

The Build vs Buy Decision (Technical Perspective)

When to use cloud:

Experiments and prototyping
Variable workloads
<500 GPUs equivalent usage

When to buy:

1,000 GPU continuous usage
Predictable workloads (training pipelines)
3+ year horizon

Hybrid approach (what we do):

Research and experiments: Cloud (GCP)
Production training: Reserved cloud instances
Inference: Mix of cloud + edge (optimized models)

My Optimization Roadmap

Quarter 1: Implement INT8 quantization

Target: 50% inference cost reduction
Engineering time: 6 weeks

Quarter 2: Model distillation

Train smaller specialist models for common queries
Target: 3x cost reduction on 70% of traffic

Quarter 3: Multi-cloud strategy

Spot instance arbitrage across AWS/GCP/Azure
Target: 30% cost reduction via pricing competition

Quarter 4: Caching and batching optimization

Intelligent request batching
Target: 40% better GPU utilization

Combined target: 4-5x cost reduction over 12 months

This is the only way to make AI products economically viable.

@cto_michelle - would love to compare notes on quantization results once you implement!

Rachel

SF Tech Week - “Efficient ML Systems” workshop, Moscone Center

eng_director_luis · October 9, 2025, 12:24am

Attended “Scaling AI Engineering Teams” session and want to add the people/process costs to this discussion.

Background: I manage 25 engineers, 8 focused on ML/AI infrastructure.

The Hidden Cost: ML Infrastructure Engineers

Salary benchmarks from the session (SF Bay Area 2025):

Junior ML Infra Engineer (1-3 years): $180K-$240K
Mid-level (3-5 years): $240K-$320K
Senior (5-8 years): $320K-$450K
Staff+ (8+ years): $450K-$600K

Total comp including equity, benefits: Add 30-40%

Why so expensive?

Required skills (rare combo):

Distributed systems
ML frameworks (PyTorch, JAX)
GPU programming (CUDA)
Cloud infrastructure (K8s, Terraform)
Performance optimization

Maybe 5,000 people globally have all these skills. High demand, limited supply.

Team Size by Company Stage

Data from Anthropic, Stability AI, Hugging Face:

Seed stage AI startup:

ML engineers: 2-4
ML infrastructure: 0 (rely on cloud services)
Total eng headcount: 5-8

Series A ($50M-$100M valuation):

ML engineers: 8-12
ML infrastructure: 1-2
Data engineers: 2-3
Total AI-focused: 11-17

Series B ($200M-$500M valuation):

ML engineers: 20-30
ML infrastructure: 4-6
Data engineers: 5-8
ML Ops: 2-3
Total AI-focused: 31-47

At scale (unicorn+):

Anthropic: ~150 ML engineers + 40 ML infrastructure
OpenAI: ~200 ML engineers + 60 ML infrastructure
Ratio stabilizes around 3-4 ML engineers per 1 infrastructure engineer

The Infrastructure Team ROI

Question I asked: “How do you justify ML infra headcount to finance?”

Anthropic VP Eng answer:

One good ML infra engineer can:

Reduce training costs 30-50% through optimization
Speed up training 2-3x
Improve researcher productivity 2x (faster iteration)

Math:

Salary cost: $400K/year fully loaded
Infrastructure savings: $1M+/year
Researcher productivity gain: 5 ML engineers × 20% faster = 1 FTE equivalent = $350K value
ROI: 3-4x

But only if you’re spending >$2M/year on infrastructure. Below that, use managed services.

Team Structure Patterns

Pattern 1: Centralized ML Platform Team (we use this)

Structure:

ML Platform team (4-6 engineers)
Builds internal tools, manages infrastructure
Serves product ML teams

Pros:

Avoid duplicated work
Consistent tooling
Better cost optimization

Cons:

Can become bottleneck
Not as close to product needs

Pattern 2: Embedded Infrastructure Engineers

Each product team has 1 ML infra engineer

Pros:

Fast iteration
Product-specific optimization

Cons:

Duplicated effort
Inconsistent practices

Pattern 3: Hybrid (what Anthropic uses)

Central platform team (10 engineers)
Plus embedded infra in each major product area (2-3 per area)

Works well at 100+ engineers, overkill below that.

Hiring and Retention Challenges

Reality check from the panel:

Time to hire ML infra engineer:

Post job → First interview: 4-6 weeks
Interview process: 3-4 weeks
Offer → Start: 4-6 weeks
Total: 3-4 months

Why so slow?

Small candidate pool
Multiple companies competing for same people
Candidates are picky (can afford to be)

Retention:

Average tenure: 2-3 years
Competitors constantly recruiting
Need to promote or give raises every 12-18 months

Our strategy:

Hire junior engineers with distributed systems background
Train on ML infrastructure (6-9 month ramp)
Cheaper, better retention
But: Need senior engineers to train them

Training and Onboarding Costs

What we spend getting new ML infra engineer productive:

Month 1-2: Reading code, small tasks (20% productive)
Month 3-4: Meaningful contributions (40% productive)
Month 5-6: Independent work (70% productive)
Month 7-9: Fully productive (100%)

Hidden costs:

Senior engineer mentoring: 20% time × 6 months = 0.1 FTE = $40K
Learning budget: $5K-$10K (courses, books, conferences)
Mistakes during ramp: Hard to quantify, but real

Full cost to productivity: $60K-$80K on top of salary

Tooling and Process Costs

Internal tools we’ve built (8 engineer-months total):

Training job scheduler ($80K dev cost)
- Manages GPU allocation
- Spot instance fallback
- Saves $200K/year in efficiency
Experiment tracking (built on MLflow, $60K customization)
- Reproducibility
- Cost attribution
- Compliance audit trail
Model deployment pipeline ($120K)
- Automated testing
- Gradual rollout
- Rollback capability

Total: $260K one-time + $80K/year maintenance

Build vs buy decision:

We could use SageMaker ($50K/year)
But wanted custom integration
Break-even: 3 years

In retrospect, should’ve bought for first 2 years, built later.

My Advice for Team Scaling

If you’re spending <$500K/year on AI infrastructure:

Don’t hire dedicated ML infra engineers yet
Use managed services (SageMaker, Vertex AI)
ML engineers handle their own infrastructure

$500K-$2M/year:

Hire first ML infra engineer (senior)
Focus on cost optimization and tooling
ROI is clear at this scale

$2M+/year:

Build ML platform team (3-5 engineers)
Centralized infrastructure management
Internal developer platforms

The mistake I see: Hiring ML infra too early (pre-Series A) or too late (Series B with no infra team)

Timing matters.

@cto_michelle @data_rachel - curious how your team structures compare?

Luis

SF Tech Week - “Scaling AI Engineering Teams” panel

product_david · October 9, 2025, 2:37am

Just left the CoreWeave “GPU Infrastructure at Scale” session - the numbers they shared are eye-opening.

Session: CoreWeave + NVIDIA “The Economics of GPU Clouds” at Moscone West

Speakers:

CoreWeave VP of Infrastructure
NVIDIA Enterprise Computing lead
Lambda Labs CEO

The H100 Market Reality (October 2025)

CoreWeave shared their current pricing:

H100 80GB SXM5 (flagship):

On-demand: $2.89/hour
1-year reserved: $1.95/hour
3-year reserved: $1.49/hour

Compare to hyperscalers:

AWS p5.48xlarge (8x H100): $98.32/hour = $12.29/GPU/hour (4.2x more expensive!)
GCP a3-highgpu-8g (8x H100): $12.48/GPU/hour
Azure ND H100 v5: $13.76/GPU/hour

Why the markup? According to AWS/GCP speakers at other sessions:

Enterprise support
Integration with cloud services
SLAs and compliance
Networking infrastructure

But for pure GPU compute, specialized clouds are 4-5x cheaper.

Source: GPU Cloud Pricing | CoreWeave (verified live during session)

The Supply Situation

NVIDIA speaker dropped some data:

H100 shipments (2024-2025):

Q4 2024: 120,000 units shipped
Q1 2025: 150,000 units
Q2 2025: 180,000 units
Q3 2025: 200,000 units

Total H100s in market: ~650,000 units globally

Demand vs Supply:

Estimated demand: 1.2M units
Supply: 650K units
Shortfall: 550K units (54% undersupplied)

Why prices dropped from $4.50 to $2.89/hour:

Major deployments completed (OpenAI, Meta, Anthropic bought huge clusters)
Supply catching up
H200 announcement (people waiting for next gen)

Quote from NVIDIA: “We expect pricing to stabilize around $2.50-3.00/hour for spot, $1.80-2.00 for reserved through Q1 2026.”

The H200 Timeline

NVIDIA roadmap revealed:

H200 (upgraded H100 with HBM3e):

Availability: Q4 2025 (limited)
Volume availability: Q1 2026
Performance: 1.4x memory bandwidth vs H100
Pricing estimate: $3.50-4.00/hour on-demand

Lambda Labs CEO: “Everyone’s waiting for H200. We expect H100 prices to drop another 15-20% when H200 ships in volume.”

Real Customer Economics

Case study shared by CoreWeave:

Customer: Mid-size AI startup (Series B)
Use case: Training 70B parameter model

Cloud comparison:

Option A: AWS

256x H100 (32x p5.48xlarge instances)
Duration: 14 days training
Cost: $12.29/GPU/hour × 256 GPUs × 336 hours = $1,056,154

Option B: CoreWeave

256x H100 cluster
Reserved pricing: $1.95/hour
Cost: $1.95 × 256 × 336 = $167,731

Savings: $888,423 (84% cheaper!)

But there’s a catch: AWS has better integration with other services (S3, CloudWatch, etc.)

Customer went with CoreWeave for training, AWS for inference.

The Inference Economics

Lambda Labs shared inference cost data:

Serving a 70B model:

Hardware requirements:

2x H100 80GB (model barely fits)
Or 4x A100 40GB
Or 8x A100 80GB

Cost per 1M tokens (output):

2x H100 setup:

GPU cost: $2.89 × 2 = $5.78/hour
Throughput: ~15K tokens/sec
Cost per 1M tokens: $0.107

4x A100 40GB setup:

GPU cost: $1.10 × 4 = $4.40/hour
Throughput: ~8K tokens/sec (slower)
Cost per 1M tokens: $0.153

For inference, A100s can be more cost-effective if you optimize for throughput.

The Networking Cost Nobody Talks About

CoreWeave infrastructure deep dive:

InfiniBand networking for GPU clusters:

256 GPU cluster networking:

32x 8-GPU nodes
InfiniBand switches: $850K
Cables and adapters: $180K
Total networking: $1.03M

GPU cost: 256 × $30K = $7.68M
Networking adds 13% to hardware cost

Why necessary? H100-to-H100 communication requires 400 Gbps+ bandwidth for efficient training.

Ethernet alternative:

400GbE switches: $320K (69% cheaper)
But training is 2.8x slower
Economics: Slower training costs more in GPU time than networking savings

Conclusion: InfiniBand is mandatory for serious training.

Geographic Arbitrage Opportunities

CoreWeave has data centers in 8 locations with different pricing:

Cheapest regions (hydroelectric power):

Las Vegas, NV: $1.85/hour H100 reserved
Chicago, IL: $1.88/hour
Minneapolis, MN: $1.90/hour

Most expensive:

Northern Virginia: $2.15/hour (high demand)
Silicon Valley: $2.25/hour (power costs)

Savings: 18% cheaper in Vegas vs Silicon Valley

But: Data egress costs matter

Training data in: Free
Model checkpoints out: $0.08/GB
2TB checkpoint = $160 to transfer

Optimize: Train in cheap region, keep data there, only transfer final model.

The Professional Services Reality

Panel discussion: “Why AI Infrastructure Projects Fail”

Average infrastructure project costs:

Hardware/cloud: $500K
But also need:

ML infrastructure engineer: $180K salary × 6 months = $90K
Integration work: $120K
Debugging and optimization: $80K
Total real cost: $790K

58% over the hardware cost alone.

CoreWeave VP: “Customers budget for GPUs, not for the engineering time. That’s why 40% of projects run out of budget before completion.”

My Takeaways

Specialized GPU clouds are 4-5x cheaper than AWS/GCP/Azure - use them for training
H100 prices will drop another 15-20% in Q1 2026 when H200 ships - time your large purchases
InfiniBand networking is mandatory - budget 13% extra for networking
Geographic arbitrage saves 18% - choose regions with cheap power
Professional services cost 58% extra - budget accordingly

For our startup:

Move training workloads from AWS to CoreWeave: Save $400K/year
Keep inference on AWS: Better integration with our stack
Wait for H200 for next major model training: Save 15-20%

This session alone will save us hundreds of thousands of dollars.

David

Reporting from SF Tech Week - CoreWeave “GPU Infrastructure at Scale” session

Sources:

CoreWeave pricing: GPU Cloud Pricing | CoreWeave
AWS p5 instances: Amazon EC2 P5 Instances – AWS
NVIDIA shipment data: Session presentation (Oct 8, 2025)
Lambda Labs case study: Session presentation

vp_eng_keisha · October 9, 2025, 2:37am

Reporting from Databricks “Production ML Infrastructure” workshop - they shared real customer cost data that’s incredibly valuable.

Session: Databricks + Snowflake “The Economics of Production ML” at Moscone Center

Speakers:

Databricks VP of ML Platform
Snowflake Head of AI/ML
Cost optimization engineers from both companies

The Real Cost Structure of ML in Production

Databricks analyzed costs across 500 enterprise ML deployments:

Average breakdown:

Training compute: 35% of total ML infrastructure cost
Inference compute: 42%
Data storage: 12%
Data movement/networking: 8%
Monitoring/logging: 3%

Key insight: Most companies optimize training costs (35%), ignore inference costs (42%) - optimizing the wrong thing!

Training Cost Optimization Strategies

Strategy 1: Spot instances

Case study: E-commerce company training recommendation models

Before (all on-demand):

128 A100s × 7 days
On-demand: $1.85/hour
Cost: $1.85 × 128 × 168 = $39,782

After (90% spot, 10% on-demand for fault tolerance):

Spot price: $0.74/hour (60% discount)
115 spot GPUs: $0.74 × 115 × 168 = $14,304
13 on-demand: $1.85 × 13 × 168 = $4,037
Total: $18,341

Savings: $21,441 (54%)

Tradeoff:

15% more training time (interruptions)
Need sophisticated checkpointing
Databricks handles this automatically

Databricks recommendation: 80-90% spot for fault-tolerant training jobs.

Strategy 2: Right-sizing GPU selection

Common mistake: Using H100s for everything

Example from Snowflake customer:

Training sentiment analysis model:

Initially: 8x H100 (overkill)
Cost: $2.89 × 8 × 24 hours = $554/day

Analysis showed:

Model size: 7B parameters
Fits in 4x A100 40GB
Training time: Only 15% longer

Optimized:

4x A100 40GB
Cost: $1.10 × 4 × 28 hours = $123/day

Savings: 78%

Quote from Databricks: “60% of training jobs use more expensive GPUs than necessary. Right-sizing saves 40-60% on average.”

Strategy 3: Batch training jobs

Case study: Fintech with 20 models to train monthly

Before (serial training):

Train each model separately
GPU utilization: 60%
Cluster idle 40% of time
Cost: $50K/month

After (batched training):

Train multiple models in parallel
GPU utilization: 92%
Same cluster, more throughput
Cost: $32K/month

Savings: 36%

Implementation: Databricks job scheduler automatically batches compatible jobs.

Inference Cost Optimization (The Bigger Opportunity)

Databricks data: Inference is 42% of costs but only 10% of optimization effort.

Strategy 1: Model quantization

Real example: Healthcare AI (diagnostic predictions)

Original model:

FP16 precision
13B parameters
Inference: 4x A100 (80GB total needed)
Cost: $1.10 × 4 = $4.40/hour
Throughput: 50 predictions/sec
Cost per 1M predictions: $24.44

Quantized to INT8:

Memory: 50% reduction (40GB needed)
Inference: 2x A100 40GB
Cost: $1.10 × 2 = $2.20/hour
Throughput: 75 predictions/sec (faster!)
Cost per 1M predictions: $8.15

Savings: 67%

Accuracy loss: <1% (acceptable for this use case)

Snowflake stat: “Quantization saves 50-70% on inference costs with <2% accuracy degradation for most models.”

Strategy 2: Serverless inference

Databricks introduced serverless ML inference (beta):

Traditional approach:

Provision 4x GPUs for peak load
Average utilization: 30%
Paying for idle capacity 70% of time

Serverless:

Pay per prediction
Auto-scales from 0 to 1000s of GPUs
No idle cost

Pricing:

$0.0003 per prediction (13B model)
Volume discounts at 10M+ predictions

Break-even analysis:

Traditional (4x A100):

Cost: $4.40/hour = $3,168/month
Covers: ~10.5M predictions

Serverless:

Cost: $0.0003 × predictions
10.5M predictions = $3,150/month

Below 10M predictions/month: Serverless cheaper
Above 10M predictions/month: Dedicated GPUs cheaper

Strategy 3: Caching and deduplication

Case study from session: Customer support chatbot

Analysis showed:

40% of queries are similar/duplicate
Cache responses for common questions
Only 60% hit the model

Before caching:

10M queries/month
All hit model
Cost: $0.0003 × 10M = $3,000

After caching:

6M unique queries hit model
4M served from cache ($0.00001/query)
Cost: $0.0003 × 6M + $0.00001 × 4M = $1,800 + $40 = $1,840

Savings: 39%

Implementation: Redis cache with embedding similarity search (costs $200/month, pays for itself 5x)

Data Storage Optimization

Problem: Training datasets and checkpoints are HUGE and companies keep everything forever.

Databricks customer audit:

Average ML team storage:

Raw training data: 150TB (kept forever)
Processed datasets: 80TB (many duplicates)
Model checkpoints: 120TB (most never used again)
Experiment artifacts: 50TB (debugging data)
Total: 400TB

Cost at $0.023/GB/month (S3): $9,200/month = $110K/year

Optimization strategy:

Tiered storage:
- Hot data (active experiments): S3 Standard
- Warm data (recent models): S3 Glacier Instant
- Cold data (compliance/archive): S3 Glacier Deep Archive
Retention policies:
- Raw data: 2 years then archive
- Checkpoints: Keep final + last 3, delete rest
- Failed experiments: Delete after 90 days

After optimization:

Hot data (S3 Standard): 40TB × $0.023 = $920/month
Warm (Instant Retrieval): 60TB × $0.004 = $240/month
Cold (Deep Archive): 100TB × $0.00099 = $99/month
Deleted: 200TB = $0

New cost: $1,259/month (was $9,200)

Savings: 86% = $95K/year

Data Movement Costs (The Hidden Killer)

Snowflake engineer: “Data egress bankrupts companies and they don’t realize it until the bill comes.”

Real incident:

ML team training in us-west-2:

Training data in us-east-1
Transferred 50TB per training run
20 training runs/month
Data transfer: 1,000TB/month

AWS data transfer out pricing:

First 100TB: $0.09/GB = $9,000
Next 900TB: $0.085/GB = $76,500
Total: $85,500/month

Just for moving data between regions!

Fix:

Replicate training data to us-west-2 once: $4,500
All subsequent training: No transfer cost
Savings: $81K/month

Databricks recommendation: “Colocate compute and data. Data transfer should be <1% of your bill, not 40%.”

The FinOps Metrics That Matter

Session introduced ML FinOps metrics:

1. Cost per training run

Track: Total cost / training job
Benchmark: Trend over time (should decrease with optimization)

2. Cost per inference

Track: Total inference cost / number of predictions
Target: <$0.0005 for most models

3. GPU utilization

Track: Actual compute time / provisioned time
Target: >80% for reserved, >95% for on-demand

4. Training efficiency

Track: Model accuracy / total training cost
Optimize: Best accuracy per dollar

5. Inference efficiency

Track: Predictions per dollar
Optimize: Maximize throughput per GPU

Databricks Unified Analytics Platform Pitch

They claimed:

“Customers moving from DIY ML infrastructure to Databricks save 40-60% on total costs.”

Breakdown:

No need for ML infrastructure engineers: Save $300K-600K/year in salaries
Automated optimization (spot instances, autoscaling): Save 40% on compute
Unified platform (less data movement): Save 60% on networking
Serverless inference: Save 30-50% on inference

Tradeoff: Lock-in to Databricks platform

My take: Worth it for most companies. Building this in-house is expensive.

My Action Items

Audit our current spend by category (training vs inference vs storage)
Implement model quantization for our 3 largest inference workloads (estimate 60% savings)
Move to tiered storage for training data and checkpoints (estimate 80% savings on storage)
Colocate data and compute (we’re paying $12K/month in transfer, should be $0)
Evaluate Databricks serverless for low-volume models (<10M predictions/month)

Estimated total savings: $200K-300K/year

This session was incredibly valuable. Highly recommend for anyone managing ML infrastructure budgets.

Keisha

Reporting from SF Tech Week - Databricks “Production ML Infrastructure” workshop

Sources:

Databricks ML cost optimization guide: https://www.databricks.com/blog/2024/12/ml-cost-optimization
AWS data transfer pricing: S3 Pricing
Snowflake ML features: https://www.snowflake.com/en/data-cloud/workloads/ai-ml/