Over the last 6 months, my team migrated our ML platform’s core data from MongoDB to PostgreSQL. It was painful, it was worth it, and the conclusion surprised me: we kept both databases.
The “just use Postgres” narrative is everywhere in 2026, and I understand why - PostgreSQL has evolved to handle so many use cases brilliantly. But our experience taught me that the real answer is more nuanced than one-size-fits-all advice.
Why We Migrated
Context matters. We’re running a real-time personalization platform at Anthropic - ML models serving predictions, A/B tests running constantly, user behavior being analyzed in real-time.
Originally, everything was in MongoDB. It made sense early on:
- Flexible schemas for rapidly evolving ML features
- Easy to iterate on data models
- Good performance for document-based access patterns
- Team was familiar with it
But as we scaled, pain points emerged:
- Complex analytics queries were slow - Joins across collections, aggregations across large datasets
- Transactions became critical - We needed ACID guarantees for experiment assignment and result recording
- Data consistency issues - Eventual consistency bit us in subtle ways
- Reporting was a nightmare - Business intelligence tools expect SQL
The Migration Journey
6 months. That’s how long it took to migrate our core transactional data to PostgreSQL. Here’s what that looked like:
Phase 1 (2 months): Dual-write to both databases, compare results, fix inconsistencies
Phase 2 (2 months): Migrate historical data, validate integrity, build rollback plans
Phase 3 (2 months): Cut over read traffic, deprecate MongoDB for core data, optimize PostgreSQL
The surprises:
- JSONB in Postgres handled our semi-structured data better than expected
- PostgreSQL’s query planner required more tuning than MongoDB
- Our ORMs (SQLAlchemy) handled the migration smoothly
- The team adapted faster than I anticipated
The Surprising Conclusion: Polyglot Persistence
Here’s where it gets interesting. We migrated core data to PostgreSQL, but we kept MongoDB for specific use cases:
PostgreSQL (System of Record):
- User accounts and authentication
- Experiment definitions and assignments
- Financial/billing data
- Relational reporting data
- Anything requiring strong consistency
MongoDB (High-Volume, Flexible Data):
- ML feature vectors and embeddings (using Atlas Vector Search)
- User event streams and clickstream data
- Model training data with evolving schemas
- Real-time analytics aggregations
- Temporary computation results
Why This Makes Sense
Different data has different characteristics:
Core business data: Relational, needs ACID, changes slowly, requires complex queries
→ PostgreSQL is perfect
ML operational data: High volume, schema evolves constantly, document-oriented, needs vector search
→ MongoDB Atlas with vector capabilities wins
Analytics/reporting: Complex joins, aggregations, BI tool compatibility
→ PostgreSQL (or we’d use a dedicated data warehouse)
The architectural pattern: PostgreSQL as the authoritative system of record, MongoDB for high-throughput, schema-flexible workloads.
Cost Analysis
This isn’t cheap, and honesty matters:
Infrastructure costs: Running two databases increased our AWS bill by ~30%
Operational complexity: Two systems to monitor, backup, optimize
Team expertise: Need people who understand both (we hired a dedicated database reliability engineer)
Development overhead: Different query patterns, different ORMs, different mental models
But the benefits justified the costs:
- Query performance improved 10x for analytical workloads
- Transaction consistency eliminated entire classes of bugs
- ML feature development velocity stayed high (MongoDB flexibility)
- Business reporting went from “nightmare” to “actually works”
The Real Lesson: Think About Access Patterns
The mistake we made early on was choosing a database based on hype or familiarity. The right approach:
- Map your data access patterns: Point lookups? Complex joins? Analytical aggregations? Vector similarity search?
- Identify consistency requirements: Strong ACID? Eventual consistency okay?
- Consider schema evolution: Stable schema? Rapidly changing?
- Think about team expertise: What does your team know? What can they learn?
- Evaluate ecosystem needs: What tools need to integrate?
For us, the answer was “both, strategically placed.”
Practical Advice
If you’re facing this decision:
- Start with PostgreSQL for core business data - It’s a safe default
- Add specialized databases for specific workloads - Don’t force everything into one system
- Invest in data architecture thinking - The best database is the one that matches your access patterns
- Plan for polyglot persistence - Most systems at scale use multiple databases
- Measure twice, migrate once - Database migrations are expensive, get it right
The “just use Postgres” advice is good for many teams, especially early on. But as you scale and your data needs diversify, strategic use of multiple databases isn’t complexity for complexity’s sake - it’s matching tools to problems.
What’s your experience? Anyone else running polyglot persistence? What surprised you most about your database choices?