Can DuckDB replace Postgres for transactional workloads?

No. DuckDB is an OLAP database optimised for analytical queries: aggregations, scans, joins across wide tables. Postgres is an OLTP database optimised for point lookups, concurrent writes, and transactional guarantees. The typical SaaS setup runs both: Postgres for the application database, DuckDB for the analytics tier.

What happens to the single-writer constraint in the per-customer Parquet pattern?

It becomes irrelevant. Each query creates a fresh in-memory DuckDB instance that reads from S3 Parquet files, returns results, and is discarded. There is no shared .duckdb file being written to. The single-writer constraint only applies when multiple processes share a persistent .duckdb file, which the Parquet pattern avoids entirely.

Does MotherDuck solve the multi-tenant problem?

MotherDuck is a managed serverless service built on DuckDB that handles infrastructure and concurrent access. It simplifies operations significantly. However, it introduces a shared-service model where your customer data is stored in MotherDuck's infrastructure, which has implications for data residency and customer contract compliance. The per-customer Parquet pattern on your own S3 buckets gives you direct control over where each customer's data lives, which matters for regulated industries and strict data-processing agreements.

How do you handle schema changes across thousands of customer Parquet files?

Write a migration script that reads each customer's current Parquet files, applies the schema transformation, and writes new files. Track completion in a metadata table with one row per customer per migration version. Run migrations in parallel across customers but serially within a single customer's files. For additive changes such as a new nullable column, DuckDB can often read old Parquet files without rewriting them — verify compatibility before running migrations you may not need.

EngineeringJun 1, 20267 min readReviewed Jun 1, 2026

DuckDB in your SaaS stack: what it replaces, what it doesn't, and the multi-tenant pattern that holds up

Why per-customer Parquet on S3 beats shared DuckDB instances for tenant isolation — and what breaks when you scale it

By FlowVerify Editorial Team

Your cloud data warehouse line item grows every quarter, even when your query volume holds steady. For most B2B SaaS companies, the growth is predictable: more customers, more data, more storage. But the larger cost driver is how cloud warehouses price compute, and how much of it you pay for without using. DuckDB for SaaS analytics changes this calculation. Not by being faster at everything, and not in the way most tutorials describe.

This piece covers what DuckDB actually is, where the cost advantage is real, and the one deployment pattern that holds up when you have thousands of customers whose data cannot touch each other.

What DuckDB actually is (and the single constraint that shapes everything)

DuckDB is an in-process analytical database. It runs inside your application process, not as a separate server. There is no socket to connect to, no authentication layer to configure, no connection pool to manage. You open a file or an in-memory instance, run SQL, read results, close it. The latency profile is fundamentally different from a network-connected database, even one running on localhost.

It uses columnar storage and vectorised query execution. These two properties make it fast for analytical workloads: aggregations over millions of rows, joins across wide tables, groupings on high-cardinality columns. They also make it slower than Postgres for point lookups and transactional writes. This is not a replacement for your OLTP database. It is a replacement for what you would otherwise send to BigQuery or Snowflake.

The constraint that shapes every deployment decision: DuckDB has a single-writer model. Two processes cannot open the same .duckdb file in read-write mode simultaneously. The second gets a lock error. Understanding this upfront prevents the most common architecture mistake, which is treating DuckDB like Postgres and discovering the constraint under production load.

Where the cost advantage actually comes from

Cloud data warehouses charge for compute and storage separately. BigQuery on-demand pricing is $6.25 per terabyte scanned. A query that touches ten 1-GB tables scans 10 GB regardless of which rows it needs. Snowflake bills per-second for virtual warehouse uptime, so a warehouse that processes one query per hour is still billing between queries unless you configure aggressive auto-suspend.

DuckDB queries Parquet files directly. Storage costs are S3 rates, roughly $0.023 per GB per month. Compute is whatever is already running your query handler: a Lambda function, an EC2 instance, a container handling other requests. There is no separate warehouse sitting idle.

At 20 GB of Parquet data per customer, S3 storage runs approximately $0.46 per customer per month. A hundred-customer SaaS at that data size spends under $50 a month on analytics storage. The same dataset in a Snowflake XS warehouse, refreshed nightly, costs substantially more once you account for warehouse compute time and storage overhead.

The crossover point is not fixed. It depends on query complexity, refresh frequency, and what compute you already provision. Below roughly 100 GB per customer, with batch or near-real-time refresh measured in minutes, the per-customer Parquet pattern reliably undercuts managed warehouse pricing. Above 500 GB per customer, or where cross-customer aggregate queries are primary, cloud warehouses earn back the advantage through optimised storage formats and query planning at scale.

Three deployment patterns for SaaS analytics

Teams reach for three architectures when adding DuckDB to a SaaS product. They differ in how customer data is stored, where the DuckDB process runs, and how tenant isolation is enforced.

Pattern	Storage	Isolation strength	Single-writer impact	Practical ceiling
A — per-customer .duckdb file on disk	Local disk	File-level	Per-file; manageable	~500 active customers before file-handle pressure
B — shared instance, schema per tenant	Single .duckdb file	Schema-level (soft)	Global; one tenant blocks all	Prototyping only
C — per-customer Parquet on S3	S3 prefixes	Prefix-level (hard)	Not applicable	Tens of thousands of customers

DuckDB deployment patterns compared

Pattern A uses one .duckdb file per customer on local disk. Isolation is solid: each customer's database is a separate file, and backup is a file copy. The problems emerge at scale. You are pinning each customer to a specific machine, file-handle counts become a concern at several thousand active customers, and there is no natural path to horizontal scaling.

Pattern B puts all customers in one shared DuckDB instance, separated by schema. This looks familiar from Postgres but behaves differently. DuckDB has no row-level security. A query missing a tenant filter reads across all schemas. The single-writer limitation applies globally: one customer's ETL job can block another customer's writes. This pattern is useful for local development and almost never appropriate for production.

Pattern C uses per-customer Parquet files on S3, with an ephemeral in-process DuckDB instance created and destroyed per query. Customer data never coexists in the same DuckDB instance. The single-writer constraint becomes irrelevant because each query creates its own instance and reads read-only from S3. This is the pattern that holds up.

The multi-tenant pattern in practice

In the per-customer Parquet model, each customer's data lives as Parquet files under an isolated S3 prefix: s3://your-bucket/tenant-{id}/. When a customer runs a report, you instantiate a DuckDB connection in memory, point it at that customer's prefix, run the query, return the result, and discard the connection. No state persists between requests.

Tenant isolation is enforced at the file-system level. A DuckDB instance pointed at one customer's S3 prefix cannot read another's prefix unless your code explicitly attaches it. There is no access control policy to misconfigure and no security rule to forget. The isolation is structural.

Data residency follows naturally. EU customers' Parquet files go in an eu-west-1 bucket. Indian customers' files go in ap-south-1. The DuckDB instance running the query runs in the same region. Data does not cross geographical boundaries to answer a query.

customer_query.py

import duckdb

def run_customer_query(tenant_id: str, query: str) -> list[dict]:
    """
    Each call creates and destroys an in-memory DuckDB instance.
    No state persists between requests; no tenant data can bleed across.
    """
    s3_prefix = f"s3://analytics-data/{tenant_id}/"

    con = duckdb.connect(database=":memory:")
    con.execute("INSTALL httpfs; LOAD httpfs;")
    con.execute("SET s3_region = 'ap-south-1';")
    # Credentials via environment or IAM role — not hardcoded.

    # This instance only sees this customer's files.
    con.execute(
        f"CREATE VIEW events AS "
        f"SELECT * FROM read_parquet('{s3_prefix}events/*.parquet');"
    )
    con.execute(
        f"CREATE VIEW orders AS "
        f"SELECT * FROM read_parquet('{s3_prefix}orders/*.parquet');"
    )

    result = con.execute(query).fetchdf()
    con.close()
    return result.to_dict(orient="records")

The write side requires serialisation. Because you write to Parquet files rather than to a live database, the standard approach is a scheduled ETL step: pull new records from your OLTP database, write or append to Parquet, make the updated files available for query. If multiple services produce data for the same customer simultaneously, route writes through a queue to serialise them at the Parquet layer. Concurrent Parquet writes from two processes corrupt the file, which is a Parquet constraint rather than a DuckDB one.

What breaks

Real-time freshness. DuckDB is batch-oriented. Data is available for query once written to Parquet, typically after a refresh interval of minutes. If your product promises sub-second analytics freshness, the toolchain is different: a streaming database such as Materialize or RisingWave, or a real-time OLAP system such as ClickHouse or Apache Druid. DuckDB is not competing with those systems.

Cross-tenant aggregates. The per-customer Parquet isolation model is an asset for data residency and an obstacle if your product requires aggregate views across all customers. Computing these requires querying every customer's prefix in sequence or in parallel. This is feasible at small customer counts and a scheduling challenge at tens of thousands.

Schema evolution at scale. Parquet encodes schema in the file. Changing a column type or adding a non-nullable column requires a migration that reads old files and writes new ones. At 5 GB per customer this is fast. At 50 GB per customer with 5,000 customers it is a multi-hour background job. Plan schema changes before your dataset grows large.

Very large per-customer datasets. The cost advantage depends on DuckDB scanning files from S3 efficiently. Above roughly 500 GB per customer, network transfer costs and scan time over remote storage start to erode the advantage. Cloud warehouses with co-located compute earn back their premium at that scale.

The operational surface you take on

Cloud data warehouses include backup, schema versioning, query history, and audit logging. DuckDB on S3 does not. What you take on:

Backup. With the per-customer Parquet pattern, enable S3 versioning on the bucket. It handles point-in-time recovery automatically at a fraction of additional storage cost.
Schema migrations. There is no built-in migration system. Write timestamped migration scripts that read a customer's current Parquet, apply the transformation, write the new version, and record completion in a metadata table. Run across customers in parallel; track failures per customer.
Version upgrades. DuckDB minor versions are generally backward-compatible; major versions have occasionally changed the on-disk format. Pin the version in your package file, test upgrades on a copy of production data before deploying, and treat version migrations with the same care as database schema migrations.
Monitoring. DuckDB has no built-in metrics endpoint. Wrap queries with wall-clock timing and log slow queries. Track p50 and p99 query latency per query type per customer, and alert when a customer's queries slow down persistently: it usually signals Parquet files that need recompaction.

DuckDB for SaaS analytics: when it fits, and when to wait

DuckDB with per-customer Parquet is worth reaching for when: per-customer data is under 100 GB; analytics freshness can be measured in minutes rather than milliseconds; multi-tenant data isolation is a compliance or contractual requirement; and your current cloud warehouse spend is driven primarily by headroom rather than compute actually used.

It is less suited when: your product requires sub-second analytics freshness; cross-tenant aggregate queries are a primary use case; per-customer datasets are large enough that S3 scan costs become material; or your team is not prepared to own backup, schema migration, and monitoring for an additional data layer. Taking on less operational complexity in exchange for higher managed-service costs is a reasonable trade early on.

Where DuckDB earns its place is the analytics tier that most B2B SaaS products end up building: per-customer dashboards, export features, scheduled reports, embedded analytics on customer-facing pages. At data sizes where cloud warehouse pricing is mostly headroom, the per-customer Parquet pattern cuts the bill significantly and gives stronger tenant isolation than schema-level separation provides.

Frequently asked questions

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Reddit moved 500+ Kafka brokers and a petabyte of live data from EC2 to Kubernetes with zero downtime. The three techniques behind it aren't specific to Kafka.

Jul 8, 2026Read full article →

EngineeringJun 1, 20267 min readReviewed Jun 1, 2026

DuckDB in your SaaS stack: what it replaces, what it doesn't, and the multi-tenant pattern that holds up

Why per-customer Parquet on S3 beats shared DuckDB instances for tenant isolation — and what breaks when you scale it

By FlowVerify Editorial Team

This piece covers what DuckDB actually is, where the cost advantage is real, and the one deployment pattern that holds up when you have thousands of customers whose data cannot touch each other.

What DuckDB actually is (and the single constraint that shapes everything)

Where the cost advantage actually comes from

Three deployment patterns for SaaS analytics

Teams reach for three architectures when adding DuckDB to a SaaS product. They differ in how customer data is stored, where the DuckDB process runs, and how tenant isolation is enforced.

Pattern	Storage	Isolation strength	Single-writer impact	Practical ceiling
A — per-customer .duckdb file on disk	Local disk	File-level	Per-file; manageable	~500 active customers before file-handle pressure
B — shared instance, schema per tenant	Single .duckdb file	Schema-level (soft)	Global; one tenant blocks all	Prototyping only
C — per-customer Parquet on S3	S3 prefixes	Prefix-level (hard)	Not applicable	Tens of thousands of customers

DuckDB deployment patterns compared

The multi-tenant pattern in practice

customer_query.py

import duckdb

def run_customer_query(tenant_id: str, query: str) -> list[dict]:
    """
    Each call creates and destroys an in-memory DuckDB instance.
    No state persists between requests; no tenant data can bleed across.
    """
    s3_prefix = f"s3://analytics-data/{tenant_id}/"

    con = duckdb.connect(database=":memory:")
    con.execute("INSTALL httpfs; LOAD httpfs;")
    con.execute("SET s3_region = 'ap-south-1';")
    # Credentials via environment or IAM role — not hardcoded.

    # This instance only sees this customer's files.
    con.execute(
        f"CREATE VIEW events AS "
        f"SELECT * FROM read_parquet('{s3_prefix}events/*.parquet');"
    )
    con.execute(
        f"CREATE VIEW orders AS "
        f"SELECT * FROM read_parquet('{s3_prefix}orders/*.parquet');"
    )

    result = con.execute(query).fetchdf()
    con.close()
    return result.to_dict(orient="records")

What breaks

The operational surface you take on

Cloud data warehouses include backup, schema versioning, query history, and audit logging. DuckDB on S3 does not. What you take on:

Backup. With the per-customer Parquet pattern, enable S3 versioning on the bucket. It handles point-in-time recovery automatically at a fraction of additional storage cost.
Schema migrations. There is no built-in migration system. Write timestamped migration scripts that read a customer's current Parquet, apply the transformation, write the new version, and record completion in a metadata table. Run across customers in parallel; track failures per customer.
Version upgrades. DuckDB minor versions are generally backward-compatible; major versions have occasionally changed the on-disk format. Pin the version in your package file, test upgrades on a copy of production data before deploying, and treat version migrations with the same care as database schema migrations.
Monitoring. DuckDB has no built-in metrics endpoint. Wrap queries with wall-clock timing and log slow queries. Track p50 and p99 query latency per query type per customer, and alert when a customer's queries slow down persistently: it usually signals Parquet files that need recompaction.

DuckDB in your SaaS stack: what it replaces, what it doesn't, and the multi-tenant pattern that holds up

What DuckDB actually is (and the single constraint that shapes everything)

Where the cost advantage actually comes from

Three deployment patterns for SaaS analytics

The multi-tenant pattern in practice

What breaks

The operational surface you take on

DuckDB for SaaS analytics: when it fits, and when to wait

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

DuckDB in your SaaS stack: what it replaces, what it doesn't, and the multi-tenant pattern that holds up

What DuckDB actually is (and the single constraint that shapes everything)

Where the cost advantage actually comes from

Three deployment patterns for SaaS analytics

The multi-tenant pattern in practice

What breaks

The operational surface you take on

DuckDB for SaaS analytics: when it fits, and when to wait

Frequently asked questions

Related reading

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.

Railway disconnected a carrier to contain an outage. It cut its last route instead.

pgvector's HNSW index has a memory cliff, and the Postgres defaults walk right into it

Stay ahead on eSignatures, compliance, and document workflows

Reddit's zero-downtime migration of 500 Kafka brokers wasn't about Kafka. It was three reusable techniques.