OLTP vs OLAP — Why Column Storage Wins for Analytics

TL;DR

Row stores are built for finding one needle in a haystack; column stores are built for counting every straw — and that fundamental difference in access pattern is why you never run analytics on your production database.

Two Different Worlds

Row Vs Column

Your production database handles OLTP: Online Transaction Processing. Insert a user. Update an order. Look up a payment by ID. Each query touches one or a few rows, reads all columns of those rows, and needs to be fast. Milliseconds.

Your analytics workload is OLAP: Online Analytical Processing. "What was the average order value by country last quarter?" This query doesn't care about individual rows. It scans millions of them, but only needs two or three columns. It can take seconds — nobody's waiting on a checkout page for this.

These two workloads have opposite access patterns. Optimizing for one hurts the other. That's why every serious company runs a separate analytics system.

Amazon's production databases handle transactions. Their Redshift clusters handle analytics. Netflix's microservices write to Cassandra and PostgreSQL. Their analytics run on Spark and Druid. Different tools for different jobs.

Row Storage: How OLTP Databases Work

PostgreSQL, MySQL, and most traditional databases store data row-by-row on disk.

Disk layout (row store):
┌──────────────────────────────────────────────────┐
│ Row 1: [user_id=1, name="Alice", country="US",  │
│         order_total=59.99, created="2024-01-15"] │
│ Row 2: [user_id=2, name="Bob",   country="UK",  │
│         order_total=124.50, created="2024-01-15"]│
│ Row 3: [user_id=3, name="Carol", country="US",  │
│         order_total=89.00, created="2024-01-16"] │
│ ...                                              │
└──────────────────────────────────────────────────┘

To read a single row (e.g., SELECT * FROM orders WHERE id = 42), the database reads one contiguous block. All columns are right there. Fast.

But to compute SELECT AVG(order_total) FROM orders, the database must read every row — including user_id, name, country, and created, which it doesn't need. With 100 columns and 1 billion rows, you're reading 99x more data than necessary.

That wasted I/O is the fundamental inefficiency of row stores for analytics.

Column Storage: How OLAP Databases Work

Column-oriented databases store data by column, not by row.

Disk layout (column store):
┌─────────────────────────────────────────────┐
│ user_id column:    [1, 2, 3, 4, 5, ...]     │
│ name column:       ["Alice","Bob","Carol"..] │
│ country column:    ["US","UK","US","DE"...]  │
│ order_total column:[59.99, 124.50, 89.00...] │
│ created column:    ["2024-01-15", ...]       │
└─────────────────────────────────────────────┘

Now SELECT AVG(order_total) only reads the order_total column. With 100 columns, you read 1% of the data. For a 1 TB table, that's 10 GB instead of 1 TB. Massive difference.

Spicy opinion: the single biggest win in data engineering is separating your OLTP and OLAP workloads. Not fancy ML pipelines. Not real-time streaming. Just "stop running analytics on your production database." This alone prevents more outages than any other architectural decision.

Why Columns Compress Better

Same-type values stored contiguously compress dramatically.

Consider the country column: ["US", "US", "US", "UK", "US", "DE", "US", "US"]. Most values are "US". A column store can apply:

Run-Length Encoding (RLE)

Original:  US, US, US, UK, US, DE, US, US
Encoded:   (US, 3), (UK, 1), (US, 1), (DE, 1), (US, 2)

For low-cardinality columns (country, status, category), RLE can compress data 10-100x.

Dictionary Encoding

Dictionary: {0: "US", 1: "UK", 2: "DE"}
Column:     0, 0, 0, 1, 0, 2, 0, 0

Instead of storing strings, store integers.
"US" (2 bytes) → 0 (2 bits if only 3 values)

When you have a column with 50 distinct values and 1 billion rows, dictionary encoding reduces storage by orders of magnitude.

Delta Encoding

For timestamps or sequential IDs:

Original:  1705276800, 1705276801, 1705276802, 1705276803
Deltas:    1705276800, +1, +1, +1

Store the base value and differences. Deltas are tiny.

Bit-Packing

After dictionary encoding, values are small integers. Pack them into the minimum bits:

3 distinct values → 2 bits per value
16 distinct values → 4 bits per value
256 distinct values → 8 bits per value (1 byte)

ClickHouse achieves 10-40x compression on real-world data using these techniques. That means a 10 TB dataset fits in 250 GB - 1 TB of storage. And compression means less I/O, which means faster queries.

Vectorized Execution

Column stores don't just read less data. They process it faster.

In a row store, the query engine processes one row at a time. For each row, it extracts the column, applies the operation, moves to the next row. This has terrible CPU branch prediction and cache behavior.

In a column store, the engine processes a vector (batch) of values from one column at once. Modern CPUs have SIMD (Single Instruction, Multiple Data) instructions that can add, compare, or filter 4-8 values in a single CPU instruction.

Row-by-row processing:
  for each row:
    if row.country == "US":        # branch prediction miss
      sum += row.order_total       # cache miss (next row is far away)

Vectorized processing:
  country_vector = load 1024 values from country column
  mask = SIMD_compare(country_vector, "US")   # processes batches of 1024 values using SIMD instructions (4-16 values per CPU instruction)
  total_vector = load 1024 values from order_total column
  sum = SIMD_masked_sum(total_vector, mask)    # same: 1024 values processed via a loop of SIMD ops

ClickHouse processes data at 1-2 GB/s per core using vectorized execution. A row store on the same hardware might manage 50-100 MB/s per core. That's a 10-20x speed difference just from how the data is arranged in memory.

Star Schema: The Analytics Data Model

OLAP databases almost universally use a star schema. A central fact table holds events or transactions. Surrounding dimension tables describe the entities involved.

                    ┌────────────────┐
                    │  dim_product   │
                    │─────────────── │
                    │ product_id (PK)│
                    │ name           │
                    │ category       │
                    │ brand          │
                    └───────┬────────┘
                            │
┌───────────────┐   ┌───────┴────────┐   ┌────────────────┐
│   dim_user    │   │  fact_orders   │   │   dim_date     │
│───────────────│   │────────────────│   │────────────────│
│ user_id (PK)  │──▶│ user_id (FK)   │◀──│ date_id (PK)   │
│ name          │   │ product_id (FK)│   │ date           │
│ country       │   │ date_id (FK)   │   │ quarter        │
│ signup_date   │   │ quantity       │   │ day_of_week    │
└───────────────┘   │ total_amount   │   │ is_holiday     │
                    │ discount       │   └────────────────┘
                    └────────────────┘

Why star schema works for analytics:

Fact tables are narrow and long. Billions of rows, but each row is just foreign keys + measures (amounts, counts, durations). Small per row, fast to scan.
Dimension tables are wide and short. Thousands to millions of rows with descriptive attributes. Small enough to fit in memory.
Queries are predictable. SELECT dim.country, SUM(fact.total_amount) FROM fact_orders JOIN dim_user ON ... GROUP BY dim.country. The join pattern is always fact-to-dimension.

The name "star" comes from the shape: fact table in the center, dimension tables radiating outward like points of a star.

Airbnb uses a star schema for their core analytics. So does Spotify. So does every company with a data warehouse. The pattern is universal because the access pattern is universal.

Snowflake Schema: Normalized Dimensions

A snowflake schema normalizes the dimension tables. Instead of dim_product having a category column, you'd have a separate dim_category table.

dim_product → dim_category → dim_department

Instead of:
dim_product: {id, name, category, department}

You get:
dim_product: {id, name, category_id}
dim_category: {id, category_name, department_id}
dim_department: {id, department_name}

Snowflake vs star:

Dimension	Star Schema	Snowflake Schema
Joins	Fewer (fact → dimension)	More (fact → dim → sub-dim)
Storage	More redundancy in dimensions	Less redundancy
Query speed	Faster (fewer joins)	Slower (more joins)
Maintenance	Simpler	More complex
Standard	Industry default	Niche use cases

Spicy opinion: snowflake schemas are almost never worth the extra joins. Dimension tables are small. The redundancy costs kilobytes. The extra joins cost query time and developer sanity. Star schema wins in practice.

Materialized Views: Pre-Computed Answers

Scanning billions of rows for every dashboard refresh is wasteful. Materialized views pre-compute common aggregations and store the results.

-- Create a materialized view
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT
    date_trunc('day', order_date) AS day,
    country,
    SUM(total_amount) AS revenue,
    COUNT(*) AS order_count
FROM fact_orders
JOIN dim_user USING (user_id)
GROUP BY 1, 2;

-- Query the materialized view (instant)
SELECT * FROM daily_revenue WHERE day = '2024-01-15';

Without the materialized view, this query scans the full fact table. With it, the query reads a pre-computed table with ~365 × N countries rows. Orders of magnitude faster.

The trade-off: storage and freshness. Materialized views consume disk space and must be refreshed. Options:

Full refresh: re-compute the entire view. Simple but slow for large datasets.
Incremental refresh: only process new/changed data. Fast but complex.
ClickHouse approach: AggregatingMergeTree engine continuously merges new data into aggregations. Near-real-time materialized views without manual refresh.

Netflix pre-computes viewing metrics into materialized views that their dashboards query. Without this, every dashboard load would scan billions of streaming events.

Column Store Internals

How does a column store actually organize data on disk?

Segments and Blocks

Data is split into segments (or row groups). Each segment contains N rows (e.g., 1 million). Within a segment, each column is stored as a separate block.

Segment 1 (rows 1-1,000,000):
  ├── user_id.block    (compressed)
  ├── name.block       (compressed)
  ├── country.block    (compressed)
  └── total.block      (compressed)

Segment 2 (rows 1,000,001-2,000,000):
  ├── user_id.block    (compressed)
  ├── name.block       (compressed)
  ├── country.block    (compressed)
  └── total.block      (compressed)

Zone Maps (Min/Max Indexes)

Each block stores its min and max values. The query engine uses these to skip entire blocks.

SELECT * FROM orders WHERE order_date = '2024-01-15'

Block 1: min=2024-01-01, max=2024-01-10 → SKIP
Block 2: min=2024-01-11, max=2024-01-20 → SCAN
Block 3: min=2024-01-21, max=2024-01-31 → SKIP

If data is sorted by the filter column, zone maps eliminate the vast majority of blocks. This is why sorting order matters enormously in column stores.

Sort Keys

Column stores let you choose a sort key — the column(s) by which data is physically sorted on disk.

-- ClickHouse: sort by date, then country
CREATE TABLE orders (
    order_date Date,
    country String,
    total_amount Decimal(10,2)
) ENGINE = MergeTree()
ORDER BY (order_date, country);

Choose your sort key based on your most common filter columns. If most queries filter by date, sort by date. Zone maps will then skip most blocks, and queries will be fast.

Wrong sort key = slow queries. If you sort by user_id but query by date, zone maps can't help. Every block has every date mixed in. This is the #1 performance mistake in column stores.

Patterns for System Design Interviews

Pattern 1: Separate OLTP and OLAP. "The production database handles transactions. We replicate data to a column store (ClickHouse/Redshift) for analytics. Never run analytics on the production database."

Pattern 2: Star schema for analytics. "Fact table holds events, dimension tables describe entities. Queries join fact to dimensions, aggregate, and group by."

Pattern 3: Materialized views for dashboards. "Pre-compute common aggregations. Dashboard queries hit the materialized view, not the raw fact table."

Pattern 4: Column compression for scale. "Column stores compress 10-40x. A 10 TB dataset fits in ~500 GB, reducing both storage cost and I/O."

Trade-offs Table

Dimension	Row Store (OLTP)	Column Store (OLAP)
Read pattern	Single row by key	Aggregate over many rows
Write pattern	Single row insert/update	Bulk load / append
Latency	Milliseconds	Seconds to minutes
Compression	Moderate (mixed types)	Excellent (same-type columns)
Index strategy	B-tree per column	Zone maps + sort key
Point lookups	Excellent	Terrible (must reconstruct row)
Full table scans	Terrible	Excellent (reads only needed columns)
Updates/deletes	Fast (in-place)	Slow (copy-on-write or batch)
Concurrency	Thousands of transactions	Tens of analytical queries

Interview Gotchas

Gotcha 1: "Can't you just add indexes to make analytics fast on PostgreSQL?" Indexes help point queries, not full scans. AVG(order_total) WHERE date > X scans millions of rows regardless of indexing. You need a columnar format to avoid reading unnecessary columns.

Gotcha 2: "Why not just use a column store for everything?" Column stores are terrible at point lookups and single-row updates. Reconstructing a full row requires reading from every column file. Updating one value requires rewriting an entire column segment. OLTP workloads need row stores.

Gotcha 3: "What about PostgreSQL's columnar extension (cstore_fdw, Citus Columnar)?" They exist and work for small-scale analytics. But they're add-ons to a row-oriented engine. For serious analytics (billions of rows, sub-second queries), use a purpose-built column store. PostgreSQL columnar is a compromise, and compromises have limits.

Gotcha 4: "How do you get data from OLTP to OLAP?" CDC (Change Data Capture) via Debezium/Kafka, batch exports (daily SQL dumps), or managed ETL tools (Fivetran, Airbyte). We cover this in the ETL/ELT lesson.

Gotcha 5: "What's the latency of column store queries?" Seconds to minutes, not milliseconds. Column stores scan massive amounts of data — they're fast at scanning, but there's still a lot to scan. For sub-second analytics, use materialized views or a specialized real-time OLAP engine like ClickHouse or Apache Druid.