Schema Evolution

TL;DR

Production databases serve live traffic — you can't take them down for schema changes. The expand-contract pattern lets you evolve schemas safely: add the new structure alongside the old one, migrate data, then remove the old structure. Know which operations are safe and which are dangerous.

The Kitchen Renovation Analogy

Imagine renovating your kitchen while still cooking meals every day. You can't rip out all the cabinets at once — you'd have nowhere to put the dishes. Instead, you build new cabinets next to the old ones, move everything over gradually, then tear out the old cabinets once they're empty.

Database schema evolution works the same way. Your application is serving live traffic. You need to change the schema without breaking anything. Every change must be backward-compatible with the currently running application code.

The Expand-Contract Pattern

This is the fundamental pattern for zero-downtime schema changes. Three phases:

Phase 1: Expand

Add new structures alongside old ones. Don't remove or rename anything.

-- Want to rename "name" to "display_name"?
-- Step 1: Add the new column (expand)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100);

The old name column still exists. The old application code still works.

Phase 2: Migrate

Update application code to write to both columns. Backfill historical data.

-- Backfill existing data
UPDATE users SET display_name = name WHERE display_name IS NULL;

# Application code writes to both during transition
def update_user_name(user_id, new_name):
    db.execute("""
        UPDATE users SET name = %s, display_name = %s
        WHERE id = %s
    """, (new_name, new_name, user_id))

Phase 3: Contract

Once all code reads from the new column and all data is migrated, remove the old column.

-- Step 3: Drop the old column (contract)
ALTER TABLE users DROP COLUMN name;

This final step only happens after the new application code is deployed everywhere and verified.

Safe vs Dangerous Operations

Safe — Do These Without Worry

Operation	Why It's Safe
Add a new table	No existing code references it
Add a nullable column	Existing rows get NULL, old code ignores it
Add an index CONCURRENTLY	PostgreSQL builds the index without locking the table
Add a new view	No impact on existing tables
Increase column size	`VARCHAR(50)` → `VARCHAR(100)` is backward-compatible

Dangerous — Handle With Care

Operation	Risk	Safe Approach
Rename a column	All queries referencing old name break instantly	Expand-contract: add new, copy data, update code, drop old
Add NOT NULL constraint	Old code inserting without the column fails	Add column nullable first, backfill, then add constraint
Change column type	Existing data might not convert cleanly	Add new column with new type, migrate data, swap in code
Drop a column	Any code still referencing it crashes	Verify zero references in code AND queries, then drop
Rename a table	Same as rename column but worse	Create new table, double-write, migrate, drop old

A Real Walkthrough: Adding a Required Column

Your users table needs a timezone column, and it should be NOT NULL with a default.

Wrong way (causes downtime):

-- This locks the table for the entire rewrite in older PostgreSQL versions
ALTER TABLE users ADD COLUMN timezone VARCHAR(50) NOT NULL DEFAULT 'UTC';

Safe way (expand-contract):

-- Step 1: Add nullable column (instant, no lock)
ALTER TABLE users ADD COLUMN timezone VARCHAR(50);

-- Step 2: Backfill in batches (not all at once!)
UPDATE users SET timezone = 'UTC' WHERE timezone IS NULL AND id BETWEEN 1 AND 100000;
UPDATE users SET timezone = 'UTC' WHERE timezone IS NULL AND id BETWEEN 100001 AND 200000;
-- ... continue in batches to avoid locking the table

-- Step 3: Update app code to always write timezone

-- Step 4: Add NOT NULL constraint after all rows have values
ALTER TABLE users ALTER COLUMN timezone SET NOT NULL;
ALTER TABLE users ALTER COLUMN timezone SET DEFAULT 'UTC';

Batching the backfill prevents long-running transactions that lock the table and block writes.

Migration Tools

Tool	Database	Approach
Flyway	Any (SQL-based)	Versioned SQL migration files. Developer-friendly.
Liquibase	Any (XML/YAML/SQL)	Enterprise-focused. Supports rollbacks.
gh-ost	MySQL	GitHub's online schema migration. Copies table, applies changes, swaps atomically. Zero locks.
pgroll	PostgreSQL	Schema migrations with automatic expand-contract.
Alembic	PostgreSQL (Python)	SQLAlchemy's migration tool. Auto-generates diffs.

gh-ost deserves special mention: it creates a shadow copy of the table, applies the schema change to the copy, replays any writes that happened during the copy via binlog, then atomically renames the tables. The original table is never locked.

Soft Deletes vs Hard Deletes

Soft Deletes

Instead of DELETE FROM users WHERE id = 42, set a flag:

UPDATE users SET is_deleted = true, deleted_at = NOW() WHERE id = 42;

Pros: - Undo capability ("oops, restore that user") - Audit trail (who was deleted and when) - Referential integrity preserved (no cascading deletes)

Cons: - Every query needs WHERE is_deleted = false — forget it once and you leak deleted data - Indexes include deleted rows (bloat) - Storage grows indefinitely - Privacy concerns (GDPR "right to be forgotten" may require actual deletion)

A Better Alternative: Audit Log Table

Instead of soft deletes on every table, maintain a separate audit log:

CREATE TABLE audit_log (
    id BIGINT PRIMARY KEY,
    table_name VARCHAR(50),
    record_id BIGINT,
    action VARCHAR(10),  -- INSERT, UPDATE, DELETE
    old_data JSONB,
    new_data JSONB,
    changed_by BIGINT,
    changed_at TIMESTAMP DEFAULT NOW()
);

Hard delete the original row (clean data, no WHERE is_deleted everywhere), but capture the full state in the audit log. Best of both worlds: clean operational data + complete history.

Temporal Data — SCD Type 2

Slowly Changing Dimensions (SCD Type 2) tracks history with valid_from and valid_to timestamps:

CREATE TABLE product_prices (
    product_id BIGINT,
    price DECIMAL(10,2),
    valid_from TIMESTAMP NOT NULL,
    valid_to TIMESTAMP DEFAULT '9999-12-31',
    PRIMARY KEY (product_id, valid_from)
);

product_prices
┌────────────┬────────┬────────────┬────────────┐
│ product_id │ price  │ valid_from │ valid_to   │
├────────────┼────────┼────────────┼────────────┤
│     1      │  9.99  │ 2024-01-01 │ 2024-06-30 │
│     1      │ 12.99  │ 2024-07-01 │ 9999-12-31 │  ← current
└────────────┴────────┴────────────┴────────────┘

Current price: WHERE product_id = 1 AND valid_to = '9999-12-31' Price on March 15: WHERE product_id = 1 AND valid_from <= '2024-03-15' AND valid_to >= '2024-03-15'

This pattern is critical for financial reporting, compliance, and any system where you need to answer "what was the value at time X?"

Interview Tip

If you need to add a column or change a schema in a system design discussion, say "I'd use the expand-contract pattern — add the new column alongside the old one, backfill data, update the application code, then drop the old column. This avoids downtime." That's a senior-level answer.

Quick Recap

Concept	What It Does	When to Use
Expand-contract	Three-phase safe schema change	Any production schema modification
CONCURRENTLY	Build index without locking table	Adding indexes to live tables
gh-ost	Online MySQL schema migrations	Large MySQL tables under load
Soft deletes	Flag instead of delete	Need undo capability (with caveats)
Audit log	Capture all changes separately	Need history without cluttering operational tables
SCD Type 2	Track value history with time ranges	Pricing, compliance, analytics