How Data Lives on Disk

TL;DR

Databases store data in fixed-size pages on disk. Without an index, finding one row means scanning every page. Indexes are like a book's table of contents — they tell the database exactly where to look, turning millions of page reads into a handful.

Why This Matters

Every time someone asks "should I add an index?" or "why is this query slow?" — the answer comes back to how data physically lives on disk. If you understand pages and I/O, you understand why indexes work, why joins can be expensive, and why denormalization sometimes makes sense.

This isn't abstract theory. It's the foundation for every performance decision you'll make.

The Heap File — Where Your Data Actually Lives

When you insert a row into a table, the database doesn't keep it in memory forever. It writes it to a heap file on disk — essentially a big, unordered collection of rows.

Think of it like a warehouse full of boxes. Each box has a label (the row data), but the boxes aren't organized in any particular order. They're just stacked wherever there's space.

The heap file is divided into pages — fixed-size blocks, typically 8 KB in PostgreSQL. Each page holds as many rows as will fit.

Heap File (users table)
┌─────────────────┐
│ Page 0           │  ← rows 1-50
│ [alice, bob, ...]│
├─────────────────┤
│ Page 1           │  ← rows 51-100
│ [dave, eve, ...] │
├─────────────────┤
│ Page 2           │  ← rows 101-150
│ [grace, hank, ..]│
├─────────────────┤
│ ...              │
│ Page 9999        │  ← rows 499,951-500,000
└─────────────────┘

A table with 500,000 users might span 10,000 pages. Each page is one trip to disk.

The Sequential Scan — Brute Force

What happens when you run this query?

SELECT * FROM users WHERE email = 'carol@example.com';

Without an index, the database has no idea which page contains Carol's row. So it does the only thing it can: read every single page, checking each row for a matching email.

This is called a sequential scan (or full table scan). For our 500,000-user table:

10,000 pages to read from disk
Each page requires a disk I/O operation
Even with modern SSDs (~0.1ms per read), that's ~1 second of pure I/O

One second to find one email. Now imagine 1,000 users searching simultaneously. That's the problem.

Why Disk I/O Is the Bottleneck

Your CPU can process billions of operations per second. RAM can deliver data in nanoseconds. But disk — even fast SSDs — operates in microseconds to milliseconds. The speed gap is enormous:

Operation	Latency	Relative Speed
CPU register access	~1 ns	1x
L1 cache	~1 ns	1x
RAM	~100 ns	100x slower than CPU
SSD random read	~100,000 ns (0.1 ms)	100,000x slower than CPU
HDD random read	~10,000,000 ns (10 ms)	10,000,000x slower

The takeaway: database performance is almost entirely about minimizing disk reads. Every optimization technique you'll learn in this course — indexes, denormalization, caching, partitioning — exists to reduce the number of pages the database needs to read from disk.

Sequential Scan vs Index Lookup

Indexes — The Table of Contents

Open a 500-page textbook. If you want to find information about "TCP handshakes," you don't read all 500 pages. You check the index at the back, find "TCP handshakes: page 247," and go straight there.

A database index works the same way. It's a separate data structure that maps column values to the pages where those rows live.

When you create an index:

CREATE INDEX idx_users_email ON users(email);

The database builds a sorted lookup structure (typically a B-tree, which we'll cover in depth in Chapter 4). Now when you search for Carol's email:

Consult the index → "carol@example.com is on page 847, row 3"
Read page 847
Done

Two page reads instead of 10,000. That's the difference between 0.2ms and 1 second.

Without index:           With index:
Read page 0  ✗          Check B-tree (2-3 page reads)
Read page 1  ✗              ↓
Read page 2  ✗          Go directly to page 847 ✓
Read page 3  ✗
...
Read page 847 ✓ Found!
...
Read page 9999 (still scanning)

The B-Tree — A Quick Preview

The most common index type is the B-tree (balanced tree). We'll do a deep dive in Chapter 4, but here's the intuition.

A B-tree is like a multi-level table of contents. Imagine a phonebook:

Level 1 (root): "A-M is in the left half, N-Z is in the right half"
Level 2: "A-F is in section 1, G-M is in section 2"
Level 3 (leaf): "Carol is on page 847"

Each "level" is one page read. A B-tree with 500,000 entries typically has only 3-4 levels. That means finding any row takes 3-4 page reads — regardless of table size. A table with 500,000 rows and a table with 50,000,000 rows both take about 3-4 reads through the index. That's the magic of logarithmic lookup.

         [M]              ← Level 1 (root): 1 page read
        /   \
     [F]     [S]          ← Level 2: 1 page read
    / | \   / | \
  [A-E][G-L][N-R][T-Z]   ← Level 3 (leaf): 1 page read → points to heap page

The Primary Key Index — You Already Have One

When you define a primary key, the database automatically creates an index on it. That's why SELECT * FROM users WHERE id = 42 is always fast — there's already a B-tree index on the id column.

In PostgreSQL, this is a regular B-tree index pointing to heap file pages. In MySQL (InnoDB), the primary key index is the table — rows are stored sorted by primary key in a structure called a clustered index. This makes primary key lookups extremely fast in MySQL, but it also means choosing a good primary key matters even more.

When Indexes Hurt

Indexes aren't free. Every index you add has a cost:

Write overhead: When you insert, update, or delete a row, every index on that table must also be updated. Five indexes on a table means every write does roughly 6x the work (one for the row, one for each index).

Storage: An index is a separate data structure on disk. On a large table, indexes can consume as much space as the data itself.

Not always used: The database's query planner decides whether to use an index. For very small tables, a sequential scan is actually faster (reading 5 pages is quicker than traversing a B-tree). The planner knows this.

We'll cover when NOT to index in Chapter 4. The key intuition: indexes trade write speed and storage for read speed. On a read-heavy table, that's a great deal. On a write-heavy logging table, it can make things worse.

Buffer Pool — The Cache You Didn't Know About

Databases don't actually read from disk on every query. They keep frequently-accessed pages in memory in a structure called the buffer pool (or shared buffers in PostgreSQL).

When you request a page: 1. Is it in the buffer pool? → Return it instantly (memory speed) 2. Not in the buffer pool? → Read from disk, put it in the buffer pool, return it

The buffer pool acts as an automatic LRU (Least Recently Used) cache. Hot pages stay in memory. Cold pages get evicted to make room. On a well-tuned database, 90-99% of reads hit the buffer pool and never touch disk.

This is why adding RAM to a database server often helps more than upgrading to faster disks — more RAM means a bigger buffer pool, which means more data stays in memory.

Quick Recap

Concept	What It Is	Why It Matters
Heap file	Unordered collection of rows on disk	Where your data actually lives
Page	Fixed-size block (8KB)	Unit of disk I/O — one trip per page
Sequential scan	Read every page to find a row	Slow — O(N) pages
Index	Sorted lookup structure (B-tree)	Fast — O(log N) pages
Primary key index	Automatic index on the PK	Why `WHERE id = X` is always fast
Buffer pool	In-memory page cache	Why repeated queries are fast
Write overhead	Indexes slow down inserts/updates	Why you don't index everything

Interview Tip

When you propose adding a table in a system design interview, immediately think about which columns will be queried most. Those are your index candidates. Say something like "I'll add an index on (user_id, created_at) since the primary query is fetching a user's recent posts." That shows you understand why indexes exist, not just that they exist.