Skip to content

Normalization

TL;DR

Normalization means organizing your tables so that each piece of information is stored in exactly one place. This prevents data from getting out of sync when you update things. Start normalized — denormalize later only when performance demands it.

The Problem Normalization Solves

Imagine an e-commerce system where the orders table looks like this:

orders
┌──────────┬───────────┬─────────────────┬──────────────┬────────┐
│ order_id │ user_name │ user_email      │ product_name │ amount │
├──────────┼───────────┼─────────────────┼──────────────┼────────┤
│    1     │ alice     │ alice@mail.com  │ Keyboard     │  79.99 │
│    2     │ alice     │ alice@mail.com  │ Mouse        │  49.99 │
│    3     │ bob       │ bob@mail.com    │ Keyboard     │  79.99 │
│    4     │ alice     │ alice@NEW.com   │ Monitor      │ 299.99 │  ← uh oh
└──────────┴───────────┴─────────────────┴──────────────┴────────┘

Alice's email appears three times. When she updates it, order 4 has the new email but orders 1 and 2 still have the old one. Now you have two different "truths" about Alice's email in the same database.

This is called an update anomaly. And it's just one of three problems:

Anomaly What Happens Example
Update anomaly Changing one fact requires updating many rows Updating Alice's email in every order row
Insert anomaly Can't add data without unrelated data Can't add a new user without them placing an order
Delete anomaly Deleting data loses unrelated information Deleting Bob's only order also loses Bob's email

Normalization eliminates all three by giving each fact its own home.

Normalization Before and After

The Fix: Separate Tables

Instead of one bloated table, split it into focused tables where each fact lives once:

users                           products
┌────┬──────┬────────────────┐  ┌────┬──────────┬────────┐
│ id │ name │ email          │  │ id │ name     │ price  │
├────┼──────┼────────────────┤  ├────┼──────────┼────────┤
│  1 │alice │ alice@NEW.com  │  │  1 │ Keyboard │  79.99 │
│  2 │bob   │ bob@mail.com   │  │  2 │ Mouse    │  49.99 │
└────┴──────┴────────────────┘  │  3 │ Monitor  │ 299.99 │
                                └────┴──────────┴────────┘

orders
┌──────────┬─────────┬────────────┬────────┐
│ order_id │ user_id │ product_id │ amount │
├──────────┼─────────┼────────────┼────────┤
│    1     │    1    │     1      │  79.99 │
│    2     │    1    │     2      │  49.99 │
│    3     │    2    │     1      │  79.99 │
│    4     │    1    │     3      │ 299.99 │
└──────────┴─────────┴────────────┴────────┘

Now Alice's email lives in exactly one row. Update it once, and every query that joins on user_id automatically gets the right value. No anomalies.

The Normal Forms

Normalization is a step-by-step process. Each "normal form" (NF) builds on the previous one, eliminating one more type of redundancy. In practice, you need to know three.

First Normal Form (1NF) — No Repeating Groups

Every column must contain a single, atomic value. No arrays, no comma-separated lists, no nested structures.

Violates 1NF:

┌────┬──────┬─────────────────────────┐
│ id │ name │ phone_numbers           │
├────┼──────┼─────────────────────────┤
│  1 │alice │ 555-1234, 555-5678      │  ← two values in one cell
└────┴──────┴─────────────────────────┘

Satisfies 1NF: Move phone numbers to their own table.

users                    phone_numbers
┌────┬──────┐           ┌─────────┬──────────┐
│ id │ name │           │ user_id │ number   │
├────┼──────┤           ├─────────┼──────────┤
│  1 │alice │           │    1    │ 555-1234 │
└────┴──────┘           │    1    │ 555-5678 │
                        └─────────┴──────────┘

Second Normal Form (2NF) — No Partial Dependencies

Satisfies 1NF plus: every non-key column depends on the entire primary key, not just part of it. This only matters when you have a composite primary key.

Violates 2NF:

Consider an order_items table with a composite key (order_id, product_id):

order_items (PK = order_id + product_id)
┌──────────┬────────────┬──────────────┬──────────┐
│ order_id │ product_id │ product_name │ quantity │
├──────────┼────────────┼──────────────┼──────────┤
│    1     │    101     │ Keyboard     │    2     │
│    1     │    102     │ Mouse        │    1     │
│    2     │    101     │ Keyboard     │    1     │
└──────────┴────────────┴──────────────┴──────────┘

product_name depends only on product_id, not on the full key (order_id, product_id). That's a partial dependency — product_name is repeated for every order containing that product.

Fix: Move product_name to a products table where it depends on product_id alone.

Third Normal Form (3NF) — No Transitive Dependencies

Satisfies 2NF plus: no non-key column depends on another non-key column.

Violates 3NF:

employees
┌────┬──────┬───────────────┬─────────────────┐
│ id │ name │ department_id │ department_name │
├────┼──────┼───────────────┼─────────────────┤
│  1 │alice │      10       │ Engineering     │
│  2 │bob   │      10       │ Engineering     │
│  3 │carol │      20       │ Marketing       │
└────┴──────┴───────────────┴─────────────────┘

department_name depends on department_id, not on the employee's id. If the Engineering department renames itself, you have to update every engineer's row.

Fix: Move department_name to a departments table.

employees                    departments
┌────┬──────┬───────────────┐  ┌────┬─────────────┐
│ id │ name │ department_id │  │ id │ name        │
├────┼──────┼───────────────┤  ├────┼─────────────┤
│  1 │alice │      10       │  │ 10 │ Engineering │
│  2 │bob   │      10       │  │ 20 │ Marketing   │
│  3 │carol │      20       │  └────┴─────────────┘
└────┴──────┴───────────────┘

BCNF — The Strict Version

Boyce-Codd Normal Form is a stricter version of 3NF. It says: every determinant (column that determines another column's value) must be a candidate key.

In practice, 3NF covers 99% of cases. BCNF only matters in edge cases with overlapping candidate keys. You'll never need it in a system design interview, but knowing it exists shows depth if an interviewer asks.

The Normal Forms Cheat Sheet

Normal Form Rule In Plain English
1NF Atomic values, no repeating groups One value per cell
2NF No partial dependencies on composite keys Every column depends on the whole key
3NF No transitive dependencies Non-key columns don't depend on other non-key columns
BCNF Every determinant is a candidate key Stricter 3NF for edge cases

How to Apply This in Practice

You don't sit in an interview reciting normal forms. Instead, use this mental checklist:

  1. Is any data repeated across rows? If yes, it probably belongs in its own table with a foreign key reference.
  2. If I update this value, do I need to update it in multiple places? If yes, normalize it.
  3. Can I add a new entity without creating dummy data in unrelated columns? If no, you have an insert anomaly.

Most developers intuitively write 3NF schemas without thinking about the formal rules. The entities-and-relationships approach from the previous lesson naturally produces normalized tables.

When to Stop Normalizing

Normalization isn't a goal in itself — it's a tool for data integrity. Over-normalizing creates its own problems:

  • Too many joins: If every query requires joining 7 tables, your reads slow down
  • Complexity: More tables means more foreign keys, more migration files, more cognitive load
  • Diminishing returns: The jump from unnormalized to 3NF is huge. The jump from 3NF to BCNF is almost never worth it in application databases.

The practical rule: Normalize to 3NF. Denormalize strategically based on access patterns. We'll cover exactly when and how in the next lesson.

Interview Tip

You never need to say "this is in third normal form." Instead, demonstrate the thinking: "I'm putting the user's email only in the users table. If I also stored it on every post, updating an email would mean touching every post row — that doesn't scale." That's normalization explained without the jargon.