PostgreSQL + DuckDB Is Replacing Entire Data Stacks — 1500x Analytical Speedup Without Leaving Postgres

I need to talk about something that’s been quietly transforming how my team thinks about data infrastructure. If you’re running a “modern data stack” with Postgres for OLTP, an ETL pipeline, a cloud data warehouse, dbt for transformations, and a BI tool on top — you might be dramatically overcomplicating things.

Enter pg_duckdb: Two Engines, One Database

pg_duckdb is a PostgreSQL extension that embeds DuckDB’s columnar analytical engine directly inside your PostgreSQL instance. Let that sink in for a moment. You get DuckDB’s blazing-fast analytical query execution alongside PostgreSQL’s rock-solid OLTP capabilities, running in the same database.

The result? Analytical queries that used to take minutes on standard PostgreSQL now complete in seconds. We’re seeing up to 1500x speedups on certain workloads — particularly wide table scans, aggregations, and joins across large datasets. DuckDB’s columnar execution engine, vectorized processing, and adaptive query optimizer are simply in a different league for analytical queries compared to Postgres’s row-oriented executor.

CMU’s 2025 database retrospective named DuckDB the “most impactful database technology of the year” and I think that understates it. DuckDB is redefining what’s possible with embedded analytical databases.

The “Modern Data Stack in a Box”

Here’s the architecture my team used to run:

PostgreSQL (OLTP) → Fivetran (ETL) → Snowflake (Warehouse) → dbt (Transform) → Metabase (Dashboard)

Five tools, five billing relationships, five sets of credentials, five things that can break at 3am. The data pipeline from a Postgres write to a dashboard update was 45 minutes on a good day.

Here’s what we run now:

PostgreSQL + pg_duckdb → Metabase (Dashboard)

That’s it. DuckDB reads the Postgres tables directly — no ETL, no data movement, no warehouse. Analytical queries execute through DuckDB’s engine while OLTP queries continue using Postgres’s standard executor. The same database serves both workloads.

Our pipeline from write to dashboard? Real-time. Because there is no pipeline. The data is already there.

The Migration That Saved Us $150K/Year

Let me share actual numbers from our migration. My team was paying $180K/year for Snowflake — and we’re not even a large data team. We had about 2TB of analytical data, ~50 active dashboards, and ~200 scheduled queries.

We migrated those analytical workloads to pg_duckdb running on the same PostgreSQL instance that already hosted our OLTP data. We did need to upgrade the instance — added more RAM and moved to NVMe storage — which increased our Postgres hosting cost by about $30K/year.

Net savings: $150K/year. And that doesn’t count the Fivetran costs we eliminated ($24K/year), the dbt Cloud costs ($12K/year), or the engineering time spent maintaining the ETL pipeline.

How the Technical Magic Works

The elegance of pg_duckdb is in how it bridges two execution models:

  1. OLTP queries (INSERT, UPDATE, DELETE, point lookups) → routed to PostgreSQL’s standard row-oriented executor. No change, no performance impact.
  2. Analytical queries (aggregations, window functions, large scans) → routed to DuckDB’s columnar execution engine. DuckDB reads Postgres tables, executes the query using vectorized processing, and returns results through the standard Postgres protocol.

From your application’s perspective, it’s just Postgres. Same connection string, same SQL dialect (with some DuckDB extensions available), same tooling. Your BI tool doesn’t know or care that DuckDB is executing the query under the hood.

DuckDB also supports reading Parquet and CSV files directly, so you can query data lakes from within Postgres. We use this to query historical data archived to S3 without loading it back into Postgres.

What pg_duckdb Doesn’t Replace

I want to be clear about the boundaries:

  • Petabyte-scale warehousing: If you’re processing petabytes, you still need Snowflake, BigQuery, or Redshift. DuckDB is phenomenal for the “medium data” problem — gigabytes to low terabytes — which is what most companies actually have.
  • Real-time streaming analytics: If you need sub-second analytics on streaming data, you still want Kafka + Flink/ksqlDB. pg_duckdb is for batch/interactive analytical queries.
  • Multi-tenant data warehousing: If you’re a data platform serving hundreds of customers with isolation requirements, a proper cloud warehouse still makes sense.

The Postgres-Does-Everything Trend

This is part of a broader trend I find fascinating: the pendulum swinging from complex multi-tool architectures back to PostgreSQL as the universal database. Look at what Postgres can do today:

  • OLTP (native)
  • Analytics (pg_duckdb)
  • Full-text search (tsvector + GIN indexes)
  • Vector/embedding search (pgvector)
  • Time-series (TimescaleDB)
  • Graph queries (Apache AGE)
  • Message queuing (pgmq)
  • Cron scheduling (pg_cron)

At what point do we just accept that Postgres is the database equivalent of a Swiss Army knife?

Question for the community: Is anyone else consolidating their data stack around Postgres? What tools have you replaced, and where did you hit the limits of the “Postgres for everything” approach?

Rachel, this post resonates deeply with me from an infrastructure perspective. I’ve been on the other side of the “modern data stack” — the one who has to keep all those tools running, connected, and healthy at 3am.

The Operational Simplicity Argument

Let me paint the picture of what managing the “modern data stack” actually looks like from an ops perspective:

  • PostgreSQL: backups, replication, failover, monitoring, connection pooling, vacuum tuning
  • Kafka/Fivetran: connector management, schema registry, consumer lag monitoring, partition rebalancing
  • Snowflake: warehouse sizing, credit monitoring, query optimization, access control
  • dbt: model dependency graphs, incremental run failures, CI/CD for transformations
  • BI tool: cache invalidation, data freshness monitoring, user access management

That’s five separate monitoring stacks, five sets of alerts, five runbooks, five on-call rotations (or more realistically, one poor SRE team trying to cover all five). When the CEO’s dashboard shows stale data, the debugging process involves checking every link in this chain.

With Postgres + pg_duckdb, I manage… Postgres. I already know how to do that. My backup strategy covers everything — OLTP and analytics. My monitoring stack (pg_stat_statements, pgBouncer metrics, standard Postgres monitoring) covers everything. My access control model is unified. When something breaks, there’s exactly one place to look.

The 80% Argument

Here’s my controversial take: 80% of companies don’t need a data warehouse at all. I’ve consulted with dozens of startups and mid-size companies on their data infrastructure, and the pattern is always the same:

  1. They have a Postgres database with 50-500GB of data
  2. Some analyst reads a blog post about the “modern data stack”
  3. They implement Postgres → Fivetran → Snowflake → dbt → Looker
  4. They now have a complex, expensive pipeline for a dataset that fits in RAM on a single server

The dirty secret of the data industry is that most companies have “medium data” — gigabytes to low terabytes — that a single well-configured Postgres instance can handle beautifully. pg_duckdb just eliminates the last excuse for adding a warehouse: analytical query performance.

What Changed My Mind

I used to be a firm believer in “right tool for the right job.” Postgres for OLTP, Snowflake for analytics, Elasticsearch for search, Redis for caching. But the operational cost of running and integrating all these tools is enormous and often underestimated.

Every integration point is a potential failure mode. Every data movement is a potential data quality issue. Every additional tool is another security surface, another set of credentials to rotate, another vendor relationship to manage.

pg_duckdb tipped the scales for me. The analytical performance gap was the main reason we needed a separate warehouse. With that gap closed for medium-scale data, the operational simplicity of a single Postgres instance outweighs the marginal performance benefits of a dedicated warehouse.

We migrated three of our portfolio companies off Snowflake onto pg_duckdb last quarter. Total combined savings: ~$400K/year. Zero regrets so far.

Great discussion, and I largely agree with the premise — pg_duckdb is a genuinely exciting development. But I want to play devil’s advocate here because I’ve seen the “Postgres for everything” trend go too far, and I think it’s worth discussing where the limits are.

The Risk of Postgres Maximalism

Over the past year, my team has been on a Postgres consolidation journey. We’ve adopted:

  • pg_duckdb for analytics (replacing Snowflake) — this has been great
  • pgvector for embedding search (replacing a managed vector DB) — working well
  • pgmq for message queuing (replacing RabbitMQ) — this is where things got interesting
  • pg_trgm + full-text search (replacing Elasticsearch) — mixed results

Each individual decision seemed reasonable. The operational simplification was real. The cost savings were real. But here’s what happened: we now have a single Postgres deployment that is simultaneously our OLTP database, analytical engine, vector store, message queue, and search engine.

The operational complexity didn’t disappear — it shifted. Instead of managing five specialized tools, we’re managing one very complex Postgres deployment with:

  • Extension conflicts and upgrade coordination
  • Resource contention between OLTP, analytical, and search workloads
  • Backup sizes that tripled (all the analytical and vector data is now in Postgres)
  • A single point of failure that, if it goes down, takes out EVERYTHING

Where I Draw the Line

After this experience, I’ve developed a personal framework for what should and shouldn’t live in Postgres:

Good candidates for Postgres consolidation:

  • Analytics on OLTP data (pg_duckdb) — the data is already there, no movement needed
  • Simple search (pg_trgm, full-text) — for internal tools and moderate-scale search
  • Vector search at moderate scale (pgvector) — works great up to ~10M vectors
  • Scheduling (pg_cron) — trivial use cases

Still better as dedicated services:

  • Message queuing / streaming: Postgres is not designed for high-throughput pub/sub. pgmq works for simple job queues, but if you need Kafka-level throughput or complex routing, use a dedicated message broker. We learned this the hard way when our pgmq-based event system couldn’t keep up with Black Friday traffic.
  • Full-text search at scale: Postgres full-text search is surprisingly capable, but Elasticsearch/Typesense are still significantly faster for typeahead, faceted search, and fuzzy matching across millions of documents. We moved our customer-facing search back to Elasticsearch after user complaints about search latency.
  • Real-time streaming analytics: If you need continuous aggregations on streaming data, Flink/ksqlDB are purpose-built for this. Postgres (even with pg_duckdb) is batch-oriented.

My Rule of Thumb

Use Postgres when the data is already in Postgres and the workload is moderate. The “data is already there” part is key — that’s what makes pg_duckdb so compelling. You’re not moving data; you’re just querying it differently.

Use a dedicated tool when the workload has fundamentally different performance characteristics that Postgres wasn’t designed for: high-throughput pub/sub, sub-millisecond search, real-time stream processing.

The sweet spot is “Postgres as a strong foundation with 2-3 specialized services for genuinely different workloads” rather than either extreme of “one tool for everything” or “best-of-breed for everything.”

Rachel, your cost savings are real and your migration makes total sense for the use case you described. I just want to make sure the community doesn’t take the message as “always use Postgres for everything” — it’s more nuanced than that.

Rachel, I want to talk about this from the finance side because the cost story is what ultimately got executive buy-in for our migration.

The Snowflake Bill That Started Everything

I’m the finance lead who has to approve our data infrastructure spend. Last year, I got a Snowflake renewal proposal for $180K/year — a 20% increase from the previous year. When I asked our data team what we were getting for that money, the answer was essentially: “We run dashboards and some ad-hoc queries.”

I dug deeper. Our Snowflake instance was processing about 2TB of data. Our heaviest query — a monthly revenue reconciliation report — took 45 seconds on Snowflake. We had about 50 dashboards, most of which were viewed by fewer than 5 people. We were paying $180K/year for what amounted to a very expensive reporting layer.

The pg_duckdb Migration Math

When our data engineer proposed pg_duckdb, I ran the numbers myself:

Item Before (Annual) After (Annual)
Snowflake $180,000 $0
Fivetran $24,000 $0
dbt Cloud $12,000 $0
Postgres upgrade (larger instance) $0 $30,000
Total $216,000 $30,000

That’s a $186K annual savings. The migration took our data engineer about 6 weeks. The ROI payback period was roughly 10 days.

That monthly revenue reconciliation query? It now runs in 3 seconds on pg_duckdb instead of 45 seconds on Snowflake. Faster AND cheaper. I don’t see that combination very often.

“Can Postgres Do This?” Is My New Default Question

This experience fundamentally changed how I evaluate SaaS data tools. Every time a vendor pitches me on a new analytics platform, data integration tool, or business intelligence solution, my first question is now: “Can we do this with Postgres + an extension?”

The answer is surprisingly often “yes”:

  • Vector search for our product recommendation engine? pgvector. Saved us ~$36K/year vs Pinecone.
  • Full-text search for our help center? PostgreSQL’s built-in tsvector + GIN indexes. Saved us ~$18K/year vs Algolia (our scale didn’t need Algolia’s speed).
  • Job scheduling for our data pipelines? pg_cron. Replaced a $6K/year scheduling SaaS.

I’m not suggesting Postgres replaces everything — Algolia is genuinely better for large-scale consumer search, and Pinecone makes sense at certain vector scales. But for a company our size (200 employees, ~2TB of data), Postgres handles an astonishing range of use cases.

A Finance Person’s Perspective on Technical Debt

Here’s something I think technologists underappreciate: every SaaS tool is financial debt, not just technical debt. Each tool comes with:

  • Annual renewal negotiations (my time has value)
  • Price increases averaging 15-20% per year in the data tooling space
  • Vendor lock-in that makes switching costs higher over time
  • Compliance and security review overhead for each vendor

When I consolidate tools into Postgres, I’m not just reducing technical complexity — I’m reducing financial complexity. One vendor (our cloud provider), one contract, one security review. My CFO loves it.

The data infrastructure space has been overbuilding for years, selling enterprise-scale solutions to companies with startup-scale data. pg_duckdb is the correction the market needed.