I need to talk about something that’s been quietly transforming how my team thinks about data infrastructure. If you’re running a “modern data stack” with Postgres for OLTP, an ETL pipeline, a cloud data warehouse, dbt for transformations, and a BI tool on top — you might be dramatically overcomplicating things.
Enter pg_duckdb: Two Engines, One Database
pg_duckdb is a PostgreSQL extension that embeds DuckDB’s columnar analytical engine directly inside your PostgreSQL instance. Let that sink in for a moment. You get DuckDB’s blazing-fast analytical query execution alongside PostgreSQL’s rock-solid OLTP capabilities, running in the same database.
The result? Analytical queries that used to take minutes on standard PostgreSQL now complete in seconds. We’re seeing up to 1500x speedups on certain workloads — particularly wide table scans, aggregations, and joins across large datasets. DuckDB’s columnar execution engine, vectorized processing, and adaptive query optimizer are simply in a different league for analytical queries compared to Postgres’s row-oriented executor.
CMU’s 2025 database retrospective named DuckDB the “most impactful database technology of the year” and I think that understates it. DuckDB is redefining what’s possible with embedded analytical databases.
The “Modern Data Stack in a Box”
Here’s the architecture my team used to run:
PostgreSQL (OLTP) → Fivetran (ETL) → Snowflake (Warehouse) → dbt (Transform) → Metabase (Dashboard)
Five tools, five billing relationships, five sets of credentials, five things that can break at 3am. The data pipeline from a Postgres write to a dashboard update was 45 minutes on a good day.
Here’s what we run now:
PostgreSQL + pg_duckdb → Metabase (Dashboard)
That’s it. DuckDB reads the Postgres tables directly — no ETL, no data movement, no warehouse. Analytical queries execute through DuckDB’s engine while OLTP queries continue using Postgres’s standard executor. The same database serves both workloads.
Our pipeline from write to dashboard? Real-time. Because there is no pipeline. The data is already there.
The Migration That Saved Us $150K/Year
Let me share actual numbers from our migration. My team was paying $180K/year for Snowflake — and we’re not even a large data team. We had about 2TB of analytical data, ~50 active dashboards, and ~200 scheduled queries.
We migrated those analytical workloads to pg_duckdb running on the same PostgreSQL instance that already hosted our OLTP data. We did need to upgrade the instance — added more RAM and moved to NVMe storage — which increased our Postgres hosting cost by about $30K/year.
Net savings: $150K/year. And that doesn’t count the Fivetran costs we eliminated ($24K/year), the dbt Cloud costs ($12K/year), or the engineering time spent maintaining the ETL pipeline.
How the Technical Magic Works
The elegance of pg_duckdb is in how it bridges two execution models:
- OLTP queries (INSERT, UPDATE, DELETE, point lookups) → routed to PostgreSQL’s standard row-oriented executor. No change, no performance impact.
- Analytical queries (aggregations, window functions, large scans) → routed to DuckDB’s columnar execution engine. DuckDB reads Postgres tables, executes the query using vectorized processing, and returns results through the standard Postgres protocol.
From your application’s perspective, it’s just Postgres. Same connection string, same SQL dialect (with some DuckDB extensions available), same tooling. Your BI tool doesn’t know or care that DuckDB is executing the query under the hood.
DuckDB also supports reading Parquet and CSV files directly, so you can query data lakes from within Postgres. We use this to query historical data archived to S3 without loading it back into Postgres.
What pg_duckdb Doesn’t Replace
I want to be clear about the boundaries:
- Petabyte-scale warehousing: If you’re processing petabytes, you still need Snowflake, BigQuery, or Redshift. DuckDB is phenomenal for the “medium data” problem — gigabytes to low terabytes — which is what most companies actually have.
- Real-time streaming analytics: If you need sub-second analytics on streaming data, you still want Kafka + Flink/ksqlDB. pg_duckdb is for batch/interactive analytical queries.
- Multi-tenant data warehousing: If you’re a data platform serving hundreds of customers with isolation requirements, a proper cloud warehouse still makes sense.
The Postgres-Does-Everything Trend
This is part of a broader trend I find fascinating: the pendulum swinging from complex multi-tool architectures back to PostgreSQL as the universal database. Look at what Postgres can do today:
- OLTP (native)
- Analytics (pg_duckdb)
- Full-text search (tsvector + GIN indexes)
- Vector/embedding search (pgvector)
- Time-series (TimescaleDB)
- Graph queries (Apache AGE)
- Message queuing (pgmq)
- Cron scheduling (pg_cron)
At what point do we just accept that Postgres is the database equivalent of a Swiss Army knife?
Question for the community: Is anyone else consolidating their data stack around Postgres? What tools have you replaced, and where did you hit the limits of the “Postgres for everything” approach?