Stop treating your data warehouse like a database

Data Engineering Jul 2024

A data warehouse is not a database. It looks like one. You query it with SQL. But the assumptions are completely different, and teams that treat it like a database will fight it constantly.

Databases are optimized for transactional workloads — many small reads and writes, row-level operations, high concurrency, ACID guarantees. Data warehouses are optimized for analytical workloads — few massive reads, column-level operations, low concurrency, eventual consistency is usually fine.

The practical consequences of ignoring this: you normalize your warehouse schema because that’s what you’d do in Postgres, then spend months fighting join performance. You update rows in place because that’s how you’d handle a correction, then wonder why your CDC pipeline is struggling. You design for write throughput, then discover your warehouse charges for compute on every query.

Design for how data flows in and how it gets read out. Everything else follows from that.

The model that clicks for me: a warehouse is a time machine for your business data. Design it like you’re building an archive, not a ledger.