The Document Model — When Documents Beat Tables

TL;DR

MongoDB stores data as self-contained JSON-like documents, which means you can read an entire entity in one shot without JOINs -- but the moment you need cross-entity relationships, you're fighting the data model instead of using it.

What It Is

Doc Vs Relational

A document database stores records as documents -- think JSON objects with nested fields, arrays, and sub-objects. No fixed schema. No rows and columns. Each document is its own little world.

MongoDB doesn't actually store JSON, though. It stores BSON (Binary JSON). BSON adds types that JSON can't express: dates, ObjectId, decimal128, binary data, regular expressions. This matters because JSON has no native date type -- everything is a string, and parsing dates from strings at scale is a nightmare you don't need.

Here's what a document looks like:

{
  "_id": ObjectId("64a7f3b2c1d4e5f6a7b8c9d0"),
  "name": "Alice Chen",
  "email": "alice@example.com",
  "addresses": [
    {
      "type": "home",
      "street": "742 Evergreen Terrace",
      "city": "Springfield",
      "zip": "62704"
    },
    {
      "type": "work",
      "street": "1 Infinite Loop",
      "city": "Cupertino",
      "zip": "95014"
    }
  ],
  "preferences": {
    "theme": "dark",
    "notifications": true,
    "language": "en"
  },
  "created_at": ISODate("2024-07-07T10:30:00Z")
}

Notice: the user's addresses live inside the user document. In PostgreSQL, that's a separate addresses table with a foreign key. In MongoDB, it's just an array. One read gets you everything.

This is the core idea. Fetch one document, get the whole entity. No JOINs. No multi-table queries. No N+1 problems.

When Documents Win

Documents beat tables in specific situations. Not all situations. Here's where they shine.

1. One-to-Few Embedded Relationships

A user with 2-3 addresses. An order with 5 line items. A blog post with 10 comments. If the related data is small, bounded, and always read together with the parent, embedding it in one document is the right call.

eBay made this bet with their product catalog. Product data -- item specifics, images, seller info -- lives together in document-style storage. When a buyer loads a listing page, one read gets everything. No joins across five tables.

2. Read-Heavy Workloads with Denormalized Data

If your app reads 10x more than it writes, denormalization is cheap. Store the data in the shape your application needs it. Skip the expensive JOIN at read time, pay a small cost at write time to keep copies in sync.

3. Flexible or Evolving Schemas

Early-stage products where the schema changes weekly? Documents handle this well. Add a new field to new documents. Old documents don't have it -- that's fine. No ALTER TABLE migration that locks your 100-million-row table for 20 minutes.

4. Semi-Structured Data

Logs, events, IoT sensor data, user-generated content where every record looks slightly different. Documents handle heterogeneous data naturally. Rows and columns fight it.

When Relational Wins

Here's the uncomfortable truth that MongoDB marketing won't tell you: most applications are better served by PostgreSQL.

Many-to-Many Relationships

Students enrolled in courses. Users who follow other users. Products in multiple categories. These are graph-like relationships. In MongoDB, you either duplicate data everywhere or use $lookup and suffer. In PostgreSQL, it's a junction table and a JOIN.

Complex Joins Across Multiple Entities

"Show me all orders placed by premium users in the last 30 days, grouped by product category, with the supplier's country." That's three JOINs in SQL. In MongoDB, you're writing a multi-stage aggregation pipeline and questioning your life choices.

Strict Consistency and Referential Integrity

When you delete a user, do all their orders, reviews, and comments get cleaned up automatically? In PostgreSQL, ON DELETE CASCADE handles this. In MongoDB, you write application-level cleanup code and hope you don't forget a collection.

Financial or Regulatory Data

If an auditor needs to prove that account balances are consistent, relational databases with ACID transactions are what you want. MongoDB added multi-document transactions in 4.0, but they're slower and the ergonomics are worse.

BSON Internals

BSON is more than "binary JSON." Understanding it helps you make better schema decisions.

Type System

BSON Type	Size	Notes
`double`	8 bytes	IEEE 754 floating point
`string`	4 + len + 1 bytes	UTF-8, length-prefixed
`document`	variable	Nested, recursive
`array`	variable	Ordered list of values
`binary`	4 + 1 + len bytes	Arbitrary bytes, has subtypes
`ObjectId`	12 bytes	Timestamp + random + counter
`boolean`	1 byte	0 or 1
`date`	8 bytes	Milliseconds since epoch
`null`	0 bytes	Just the type marker
`int32`	4 bytes	Signed 32-bit integer
`int64`	8 bytes	Signed 64-bit integer
`decimal128`	16 bytes	Exact decimal math

ObjectId Structure

The 12-byte ObjectId contains embedded information:

| 4 bytes  | 5 bytes | 3 bytes  |
| timestamp | random  | counter  |

The first 4 bytes are a Unix timestamp. This means ObjectIds are roughly sortable by creation time. You can extract the creation timestamp from any ObjectId without storing a separate created_at field. Clever trick, but don't rely on it for precise ordering.

Document Size Limit: 16MB

This is hard-coded. You cannot change it. A single document cannot exceed 16 megabytes.

Sounds generous until you realize what breaks it: a social media post with 100K comments embedded. A product with 50K reviews. An analytics document that accumulates data over months.

The 16MB limit is MongoDB's way of saying: "If your document is this big, your schema is wrong."

Schema Validation -- Not Schema-Less, Schema-Flexible

The biggest misconception about MongoDB: "It has no schema."

Wrong. MongoDB has optional schema validation using JSON Schema. You can enforce types, required fields, value ranges, and patterns at the database level.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "name", "created_at"],
      properties: {
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
        },
        name: {
          bsonType: "string",
          minLength: 1,
          maxLength: 200
        },
        age: {
          bsonType: "int",
          minimum: 0,
          maximum: 150
        },
        created_at: {
          bsonType: "date"
        }
      }
    }
  },
  validationAction: "error"  // reject invalid docs
})

You can also set validationAction: "warn" to log invalid documents without rejecting them. Useful during migrations.

Spicy take: If you're using MongoDB without schema validation in production, you're storing garbage and you don't know it yet. "Schema-flexible" does not mean "anything goes."

The "$lookup Is Not a JOIN" Trap

MongoDB's aggregation pipeline has $lookup. It looks like a JOIN. Don't be fooled.

db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "user_id",
      foreignField: "_id",
      as: "user"
    }
  }
])

Here's what's actually happening:

$lookup is a left outer join -- unmatched documents still appear with an empty array.
It runs on a single node -- it cannot be distributed across shards efficiently.
The "joined" collection (from) must be in the same database.
On sharded clusters, the from collection can be sharded (since MongoDB 5.1), but performance degrades.
There's no query optimizer rewriting your $lookup into a hash join or merge join. It's a nested loop.

The real problem: if you find yourself writing $lookup in every query, you chose the wrong database. MongoDB is designed for embedded data. If your data model requires joins, PostgreSQL will do it 10x better with a proper query planner.

Patterns for System Design Interviews

When you mention MongoDB in an interview, the interviewer wants to know you understand when to use it. Here's how to frame it.

Pattern 1: User Profile Store

"Users have profiles with nested preferences, addresses, and settings. Read-heavy workload -- profiles are fetched on every page load. Writes are infrequent (user updates profile). Embed everything in one document. One read, no joins."

Pattern 2: Product Catalog

"Products have variable attributes -- a laptop has RAM and CPU, a shirt has size and color. Relational schema requires either sparse columns or EAV tables. Document model handles heterogeneous attributes naturally."

Pattern 3: Content Management

"Articles with embedded metadata, tags, author info. Each article is self-contained. Content varies in structure. MongoDB fits because every article is different."

Anti-Pattern to Call Out

"We wouldn't use MongoDB for the order processing system because orders reference users, products, and inventory. Multi-entity transactions with referential integrity are relational territory."

Calling out the anti-pattern shows maturity. Interviewers love it.

Schema Patterns

Trade-Offs Table

Factor	Documents (MongoDB)	Relational (PostgreSQL)
Single-entity reads	One document, one read	May need JOINs across tables
Many-to-many relations	Awkward -- duplicate or $lookup	Natural -- junction tables
Schema changes	Add fields freely	ALTER TABLE, possible locks
Referential integrity	Application-level only	Database-enforced (FK constraints)
Complex queries	Aggregation pipeline (verbose)	SQL (declarative, optimized)
Write throughput	High (no index overhead from FKs)	Moderate (FK checks on writes)
Horizontal scaling	Native sharding built-in	Possible but harder (Citus, partitioning)
ACID transactions	Multi-doc since 4.0 (slower)	Battle-tested for decades
Tooling/ecosystem	Growing, modern	Mature, enormous
Data duplication	Expected (denormalized)	Minimized (normalized)

Interview Gotchas

"MongoDB is schema-less"

No. MongoDB is schema-flexible. It supports JSON Schema validation. In practice, your application code enforces a schema whether you declare one or not. The question is whether you also enforce it at the database level.

"MongoDB can't do transactions"

It can since version 4.0. Multi-document, multi-collection ACID transactions exist. But they have overhead -- up to 10x slower than single-document operations. Design your schema to avoid needing them when possible.

"Documents are always better for performance"

Only for single-entity reads. A reporting query that aggregates across millions of users is often faster in PostgreSQL with proper indexes and a query planner. MongoDB's aggregation pipeline doesn't have a cost-based optimizer.

"The 16MB limit doesn't matter"

It matters when your document grows over time. An unbounded array pattern -- like embedding every comment on a viral tweet -- will hit the limit. Use the Outlier Pattern or Bucket Pattern (covered in Lesson 3) for data that grows without bound.

"Just use MongoDB for everything"

The most dangerous advice in system design. MongoDB solves specific problems well. Using it for financial ledgers, graph relationships, or full-text search is fighting the tool. Pick the right database for the access pattern.

Summary: The Decision Checklist

Before choosing MongoDB in a system design interview, ask these five questions:

Is the primary access pattern single-entity reads? If yes, MongoDB.
Do entities have one-to-few embedded relationships? If yes, MongoDB.
Do you need many-to-many JOINs? If yes, PostgreSQL.
Is referential integrity critical? If yes, PostgreSQL.
Will the schema change frequently? If yes, MongoDB gets a point.

MongoDB is not a general-purpose replacement for PostgreSQL. It's a specialized tool for document-shaped data with single-entity access patterns. Know when to reach for it, and more importantly, know when not to.