We migrated from MongoDB to PostgreSQL and I have thoughts - the 'just use Postgres' advice isn't always right

data_rachel · February 22, 2026, 4:53am

Over the last 6 months, my team migrated our ML platform’s core data from MongoDB to PostgreSQL. It was painful, it was worth it, and the conclusion surprised me: we kept both databases.

The “just use Postgres” narrative is everywhere in 2026, and I understand why - PostgreSQL has evolved to handle so many use cases brilliantly. But our experience taught me that the real answer is more nuanced than one-size-fits-all advice.

Why We Migrated

Context matters. We’re running a real-time personalization platform at Anthropic - ML models serving predictions, A/B tests running constantly, user behavior being analyzed in real-time.

Originally, everything was in MongoDB. It made sense early on:

Flexible schemas for rapidly evolving ML features
Easy to iterate on data models
Good performance for document-based access patterns
Team was familiar with it

But as we scaled, pain points emerged:

Complex analytics queries were slow - Joins across collections, aggregations across large datasets
Transactions became critical - We needed ACID guarantees for experiment assignment and result recording
Data consistency issues - Eventual consistency bit us in subtle ways
Reporting was a nightmare - Business intelligence tools expect SQL

The Migration Journey

6 months. That’s how long it took to migrate our core transactional data to PostgreSQL. Here’s what that looked like:

Phase 1 (2 months): Dual-write to both databases, compare results, fix inconsistencies
Phase 2 (2 months): Migrate historical data, validate integrity, build rollback plans
Phase 3 (2 months): Cut over read traffic, deprecate MongoDB for core data, optimize PostgreSQL

The surprises:

JSONB in Postgres handled our semi-structured data better than expected
PostgreSQL’s query planner required more tuning than MongoDB
Our ORMs (SQLAlchemy) handled the migration smoothly
The team adapted faster than I anticipated

The Surprising Conclusion: Polyglot Persistence

Here’s where it gets interesting. We migrated core data to PostgreSQL, but we kept MongoDB for specific use cases:

PostgreSQL (System of Record):

User accounts and authentication
Experiment definitions and assignments
Financial/billing data
Relational reporting data
Anything requiring strong consistency

MongoDB (High-Volume, Flexible Data):

ML feature vectors and embeddings (using Atlas Vector Search)
User event streams and clickstream data
Model training data with evolving schemas
Real-time analytics aggregations
Temporary computation results

Why This Makes Sense

Different data has different characteristics:

Core business data: Relational, needs ACID, changes slowly, requires complex queries
→ PostgreSQL is perfect

ML operational data: High volume, schema evolves constantly, document-oriented, needs vector search
→ MongoDB Atlas with vector capabilities wins

Analytics/reporting: Complex joins, aggregations, BI tool compatibility
→ PostgreSQL (or we’d use a dedicated data warehouse)

The architectural pattern: PostgreSQL as the authoritative system of record, MongoDB for high-throughput, schema-flexible workloads.

Cost Analysis

This isn’t cheap, and honesty matters:

Infrastructure costs: Running two databases increased our AWS bill by ~30%
Operational complexity: Two systems to monitor, backup, optimize
Team expertise: Need people who understand both (we hired a dedicated database reliability engineer)
Development overhead: Different query patterns, different ORMs, different mental models

But the benefits justified the costs:

Query performance improved 10x for analytical workloads
Transaction consistency eliminated entire classes of bugs
ML feature development velocity stayed high (MongoDB flexibility)
Business reporting went from “nightmare” to “actually works”

The Real Lesson: Think About Access Patterns

The mistake we made early on was choosing a database based on hype or familiarity. The right approach:

Map your data access patterns: Point lookups? Complex joins? Analytical aggregations? Vector similarity search?
Identify consistency requirements: Strong ACID? Eventual consistency okay?
Consider schema evolution: Stable schema? Rapidly changing?
Think about team expertise: What does your team know? What can they learn?
Evaluate ecosystem needs: What tools need to integrate?

For us, the answer was “both, strategically placed.”

Practical Advice

If you’re facing this decision:

Start with PostgreSQL for core business data - It’s a safe default
Add specialized databases for specific workloads - Don’t force everything into one system
Invest in data architecture thinking - The best database is the one that matches your access patterns
Plan for polyglot persistence - Most systems at scale use multiple databases
Measure twice, migrate once - Database migrations are expensive, get it right

The “just use Postgres” advice is good for many teams, especially early on. But as you scale and your data needs diversify, strategic use of multiple databases isn’t complexity for complexity’s sake - it’s matching tools to problems.

What’s your experience? Anyone else running polyglot persistence? What surprised you most about your database choices?

alex_dev · February 22, 2026, 4:54am

Rachel, this is exactly the kind of nuanced take I appreciate. The “one database to rule them all” thinking never made sense to me, but it’s so prevalent.

ORM and Developer Experience Questions

I’m curious about the developer experience during and after the migration. You mentioned SQLAlchemy handled it smoothly - can you elaborate?

We’re currently on Prisma with PostgreSQL for our Next.js app, and I’ve added Redis for caching. The context-switching between different query patterns is real:

Prisma’s query builder for Postgres
Redis commands for cache operations
Thinking about data normalization vs denormalization for each store

How did your team handle this? Do you have:

Unified data access layer that abstracts the database choice?
Different services owning different databases?
Clear guidelines for when to use which database?

The Migration Timeline Seems Optimistic

6 months for a migration sounds fast, honestly. Was this:

All hands on deck migration?
Background process while shipping features?
Freeze on new feature development?

I’m asking because we’re considering adding a time-series database for observability metrics, and I’m trying to gauge the real cost beyond just the infrastructure bill.

Next.js Ecosystem Considerations

One thing I’ve noticed: the modern Next.js/React ecosystem is increasingly Postgres-first:

Vercel Postgres
Supabase
PlanetScale (MySQL but similar patterns)
Prisma ORM defaults

MongoDB still works great, but the ecosystem momentum feels like it’s shifted. Tools like Drizzle ORM, server actions with type-safe queries - they all assume relational DBs.

Did you find the TypeScript/tooling ecosystem made the Postgres migration easier? Or was that not a factor for your Python-heavy stack?

Polyglot Persistence Complexity

You mentioned hiring a database reliability engineer specifically for this. That’s a real cost that’s easy to overlook. We’re a 12-person team - polyglot persistence might be too much operational overhead for us right now.

But I can see the appeal as we scale. Your decision framework is helpful - thinking about access patterns first, not database popularity.

Follow-up question: How do you handle transactions that span both databases? Or do you design around that constraint?

eng_director_luis · February 22, 2026, 4:54am

Rachel, thank you for the honest cost analysis. This mirrors our experience in financial services, but with even more constraints.

The Enterprise Reality: Postgres is Mandatory (for some things)

In fintech, PostgreSQL isn’t a choice for core financial data - it’s a regulatory requirement. Our auditors and compliance teams need:

ACID transactions for all financial records
SQL for audit queries
Point-in-time recovery
Mature backup/restore processes
Well-understood security models

MongoDB doesn’t meet these requirements in our regulatory environment, regardless of technical merits.

But here’s where it gets interesting: We still use MongoDB, just not for anything compliance-critical.

Our Polyglot Architecture

Similar to your setup:

PostgreSQL:

Customer accounts and KYC data
Transaction records
Account balances
Audit logs
Anything that touches money or regulations

MongoDB:

User session data
Application configuration
Feature flags and A/B tests
Analytics events (pre-aggregation)
Internal tools and admin dashboards

Redis:

Session cache
Rate limiting
Real-time data (sub-second TTL)

DynamoDB (AWS):

High-throughput application logs
Metrics and monitoring data

The Operational Nightmare You Mentioned

Your 30% infrastructure cost increase? We saw similar. But the hidden costs are worse:

On-Call Complexity: Engineers need to understand 4 different database systems to debug production issues. We had an incident where a MongoDB replica set failure cascaded into a Redis cache stampede that overwhelmed Postgres. Debugging required expertise across all three.

Disaster Recovery: Each database has different backup strategies, different restore procedures, different RTO/RPO guarantees. Our DR runbooks are 100+ pages because we have to document everything.

Cross-Database Transactions: You asked how to handle this. Short answer: We don’t. We design around the constraint:

Eventual consistency where possible
Saga pattern for distributed transactions
Careful service boundaries to minimize cross-DB operations
PostgreSQL as the ultimate source of truth

When we absolutely need atomicity across systems, we use PostgreSQL’s outbox pattern with event sourcing.

Team Expertise Is the Bottleneck

You mentioned hiring a dedicated DRE. We did too, but it’s not enough at our scale (40+ engineers). We ended up creating “platform teams” owning each database:

Data Platform Team (3 engineers): PostgreSQL, schema migrations, performance
Infrastructure Team (5 engineers): MongoDB, Redis, DynamoDB operations
SRE Team (4 engineers): Monitoring, alerting, disaster recovery for all systems

That’s 12 engineers supporting database infrastructure for 40 application engineers. The ratio is real.

When Would I Recommend Against This?

Polyglot persistence makes sense at scale, but not for smaller teams. If you’re:

Less than 20 engineers
Early stage, pivoting frequently
Limited ops expertise
Not hitting clear database bottlenecks

Just use Postgres. Add complexity only when there’s clear ROI.

For larger, more mature organizations with specialized workloads? The complexity is justified.

The Question I’d Ask Rachel

You mentioned MongoDB Atlas Vector Search for ML embeddings. We’re evaluating vector databases (pgvector vs Pinecone vs MongoDB Atlas) for our fraud detection ML models.

How did you evaluate vector search solutions? What made MongoDB Atlas the right choice over alternatives?

The regulatory implications of AI/ML in financial services are evolving, and I’m curious if you have thoughts on data lineage and explainability with vector embeddings in production.

vp_eng_keisha · February 22, 2026, 4:55am

This discussion is hitting all the pain points I’m experiencing right now as we scale our EdTech platform. Rachel, I especially appreciate the honest assessment of team expertise challenges.

The People Side of Polyglot Persistence

Technology decisions are ultimately people decisions. Here’s what I’m grappling with:

Hiring Challenges: Finding engineers with deep expertise in multiple database systems is hard. We can find:

Great Postgres DBAs
Solid MongoDB developers
Engineers who know both at surface level

But engineers who truly understand the performance characteristics, failure modes, and operational nuances of multiple systems? Rare and expensive.

Training and Onboarding: When a new engineer joins, they now need to learn:

Our application architecture (obviously)
PostgreSQL schema design and query optimization
MongoDB document modeling and aggregation pipelines
When to use which database (the tribal knowledge problem)

This has increased our ramp-up time from ~2 months to ~3-4 months. That’s a real cost.

On-Call Rotation Complexity: Luis touched on this, but it’s worth emphasizing. Our on-call engineers need to be able to debug issues across multiple database systems at 3 AM.

We had an outage where slow MongoDB queries caused connection pool exhaustion, which triggered a circuit breaker that failed open, overwhelming our PostgreSQL primary. The engineer on call needed to understand both systems to diagnose and fix it.

The EdTech Context: Student Data Patterns

Your access pattern framework resonates. For us:

PostgreSQL (Student Records System):

Student enrollment and demographic data
Course catalog and schedules
Grades and transcripts
Teacher/parent relationships
FERPA-protected information

This is all highly relational, requires ACID transactions, and has strict compliance requirements (similar to Luis’s fintech constraints).

Time-Series Database (InfluxDB for Learning Analytics):

Student learning interactions (clicks, video watch time, problem attempts)
Real-time engagement metrics
Feature usage analytics
Performance dashboards for teachers

We tried using PostgreSQL for this, but the write volume (millions of events/day) and time-based query patterns made a specialized time-series DB the right choice.

Redis (Session and Cache):

User sessions
Real-time notification queues
Course content cache
Student dashboard aggregations

The Framework That Helps Me

I share this decision framework with my team:

Start with why we’re considering a new database: What problem isn’t solved by existing systems?
Quantify the cost: Infrastructure + operational overhead + team cognitive load
Identify who owns it: Which team is responsible for reliability, performance, schema evolution?
Plan the learning path: How do we upskill the team? What documentation do we need?
Define success metrics: How do we know this was the right decision in 6 months?

We almost added MongoDB last year for learning content (semi-structured, rapidly evolving). We decided against it because:

Our team had limited MongoDB expertise
JSONB in Postgres was “good enough”
The operational complexity wasn’t justified yet
We were under-staffed on database operations

Maybe in a year with a bigger team and clearer bottlenecks. But not yet.

Documentation and Knowledge Sharing

One thing I haven’t seen mentioned: How do you maintain institutional knowledge across multiple database systems?

We use:

Architecture Decision Records (ADRs) for database choices
Runbooks for each database system
Regular “database office hours” where platform team answers questions
Pair programming across teams to share expertise

Even with this, knowledge silos are forming. The PostgreSQL experts don’t fully understand MongoDB, and vice versa.

Questions for Rachel

How do you handle schema evolution across both databases? Do you have automated migration tooling?
Developer tooling: Do engineers need to context-switch between different query languages/tools? How do you make that ergonomic?
Metrics and observability: How do you monitor performance and health across heterogeneous systems? Unified dashboard or separate tools?

This is such a relevant discussion for where we are as an organization. Thank you for sharing the real costs and trade-offs.

product_david · February 22, 2026, 4:55am

As the PM who doesn’t write database queries but definitely feels their impact, this thread is incredibly valuable.

The Product Impact of Database Decisions

Let me share the business side that engineering leaders might not always see:

Customer-Facing Performance: Last quarter, we had a major enterprise prospect ask detailed questions about our database architecture during the technical due diligence phase. They wanted to know:

How we ensure data consistency
What our RTO/RPO guarantees are
Whether we can provide real-time analytics
How we handle data residency requirements

Our database choices directly impacted the deal. PostgreSQL for transactional data gave them confidence. The polyglot persistence approach initially made them nervous (“more complexity = more risk”), but we walked through the architecture rationale.

Feature Velocity: Rachel, you mentioned ML feature development stayed fast with MongoDB flexibility. This is critical from a product perspective.

If database migrations slow down feature development by 6 months, that’s 6 months of lost competitive advantage, delayed customer feedback, and missed revenue. The trade-off has to be worth it.

Cost Structure and Unit Economics: The 30% infrastructure cost increase you mentioned - I need to understand that in context:

Cost per active user
Cost per transaction
Cost to serve vs customer LTV

If we’re a B2C product with tight margins, 30% infrastructure increase might be unsustainable. If we’re B2B SaaS with strong unit economics, might be fine.

The “Good Enough” Principle

Maya mentioned this in the tRPC thread, and it applies here too: Sometimes the “best” technical solution isn’t the best business solution.

PostgreSQL JSONB being “good enough” for semi-structured data means:

One database to operate instead of two
Simpler architecture for customers to understand
Faster hiring (Postgres experts more common)
Lower infrastructure costs

The 10% performance gain from using MongoDB might not be worth the operational complexity for many businesses.

Questions I’d Ask Engineering

When evaluating polyglot persistence:

What customer problems does this solve? Not engineering problems - customer problems.
What’s the opportunity cost? If we spend 6 months on database migration, what features don’t we build?
How does this affect our SLAs and reliability? More moving parts = more failure modes?
What’s the business justification? Can we quantify the ROI in terms of customer value or cost savings?
What’s our rollback plan? If this doesn’t work out, how hard is it to reverse?

Appreciating the Transparency

Rachel, the fact that you kept both databases and are honest about the costs is refreshing. Too often I see engineering teams pursue technical elegance without considering business constraints.

The access pattern framework makes sense even to non-technical me:

Match the right tool to the specific problem
Don’t force everything into one solution
Be honest about trade-offs

One question: How do you involve product/business stakeholders in these technical architecture decisions? Or is it mostly an engineering-led decision?

I want to be a better partner to our engineering team on these discussions, and understanding how other orgs approach it would help.