Design a Social Search Engine
TL;DR
Build an inverted index from scratch -- no Elasticsearch allowed. The interviewer wants you to explain posting lists, two-pointer intersection, BM25 scoring, and the choice between document partitioning and term partitioning. Facebook's Unicorn system processes billions of queries per day against an index of trillions of edges using exactly these primitives. The core insight: a social search engine is NOT a web search engine. Social signals (who posted it, who liked it, who you're friends with) dominate text relevance, and privacy filtering at render time is the cleanest way to handle access control without polluting the index.
The System
Think Facebook post search. A user types "Taylor Swift concert" and expects to see:
- Posts from their friends about Taylor Swift concerts
- Public posts from popular pages about Taylor Swift concerts
- Their own past posts mentioning Taylor Swift
The results must be relevant (text match), personalized (friend posts first), and fresh (new posts searchable within seconds). Facebook's internal system for this is called Unicorn -- an in-memory, social-graph-aware inverted index running on thousands of commodity servers. It replaced three predecessor systems (PPS, Typeahead, various location backends) and handles billions of queries per day.
The twist that makes this different from Google web search: the searcher's identity changes the results. When I search "Taylor Swift concert," my friend's post about going to the Eras Tour should rank higher than a random stranger's post, even if the stranger's post has more likes. This is the social graph signal, and it fundamentally changes how you build the index and the ranking function.
Requirements
Functional Requirements
| Requirement | Details |
|---|---|
| Keyword search | Search posts by text content. Multi-word queries use AND semantics by default. |
| Phrase search | "Taylor Swift" as an exact phrase, not just two separate words. |
| Social ranking | Posts from friends rank higher than posts from strangers. |
| Freshness | New posts are searchable within 10 seconds of creation. |
| Filters | Filter by post type (text, photo, video), date range, author. |
| Autocomplete | As-you-type suggestions (covered separately in Lesson 6). |
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Search latency (p50) | < 200 ms |
| Search latency (p99) | < 500 ms |
| Indexing latency | < 10 seconds from post creation to searchability |
| Availability | 99.9% (search is important but not critical path for posting) |
| Scale | 3 billion users, 1 billion posts/day, ~10,000 search queries/sec |
Back-of-Envelope Math
Total posts: ~1 trillion (accumulated over 15+ years)
New posts/day: 1 billion
New posts/sec: ~11,600
Index writes/sec: ~11,600 posts x ~50 tokens avg = ~580,000 posting list updates/sec
Search queries/day: ~1 billion
Search QPS: ~11,500 average, ~35,000 peak
Post size: ~500 bytes avg (text content only)
Total raw text: 1 trillion x 500 bytes = 500 TB
Index size: ~700 TB (Facebook's actual number -- includes metadata, positions, frequencies)
In-memory hot index: ~50-100 TB (recent posts, high-engagement posts)
Posting list sizes:
Common word ("the"): billions of entries
Rare name ("Zuckerberg"): millions of entries
Obscure term: thousands of entries
The critical number: 700 TB of index data. This does NOT fit in RAM on any reasonable cluster. Facebook solved this with a two-tier architecture: RAM for hot data, SSD for the rest. This is the first design decision that separates strong answers from weak ones.
Naive Design
Start simple. Single machine, single process.
Components:
- Tokenizer: Split post text into terms. "I saw Taylor Swift" becomes
["saw", "taylor", "swift"](lowercase, stop words removed). - In-memory inverted index: A
HashMap<String, List<PostID>>mapping each term to a sorted list of post IDs containing that term. - Forward index: A
HashMap<PostID, PostMetadata>storing the actual post data (author, text, timestamp, likes). - Search function: Look up each query term in the inverted index, intersect the posting lists, score and rank the results.
Search for "Taylor Swift":
- Look up
"taylor"->[post_3, post_7, post_12, post_45, ...] - Look up
"swift"->[post_7, post_12, post_33, post_45, ...] - Intersect:
[post_7, post_12, post_45, ...] - For each match, compute relevance score
- Sort by score, return top 20
Intersection algorithm (two-pointer merge on sorted lists):
i = 0, j = 0
result = []
while i < len(list_A) and j < len(list_B):
if list_A[i] == list_B[j]:
result.append(list_A[i])
i += 1; j += 1
elif list_A[i] < list_B[j]:
i += 1
else:
j += 1
Time: O(m + n) where m and n are the posting list lengths. This is the fundamental operation of every search engine, from Google to Elasticsearch to Facebook Unicorn. If you cannot explain this algorithm, you cannot pass this interview.
Scoring: For now, sort by recency (newest first). We will fix this.
Where It Breaks
Problem 1: 700 TB Does Not Fit in Memory
The in-memory HashMap approach works for a toy prototype. At Facebook scale, the inverted index is 700 TB. Even at $5/GB for RAM, that is $3.5 million just for index memory -- and you need replicas. SSD storage at $0.10/GB would be $70,000. The economics force a tiered approach.
Problem 2: Scoring by Recency is Wrong
If I search "birthday party" and my best friend posted about their birthday party 3 months ago, that should rank higher than a stranger's post from yesterday. Recency alone misses the social signal entirely.
Problem 3: 580K Posting List Updates Per Second
Each new post contains ~50 tokens. Each token requires appending to a posting list. That is 580K append operations per second, all requiring lock coordination if the index is shared. A single-threaded approach cannot keep up.
Problem 4: No Phrase Search
The naive inverted index tells you which posts contain "Taylor" AND "Swift" but not which posts contain "Taylor Swift" as an adjacent phrase. The post "Swift delivery of Taylor's package" would be a false positive.
Problem 5: Privacy
Not all posts are visible to all users. A search by user A should never return a private post by user B unless A has access. Filtering at query time means the index must either store access control lists per posting or filter after retrieval.
Real Design

Architecture Overview
┌─────────────────────┐
│ Query Service │
│ (parse, rewrite, │
│ fan out, merge) │
└────────┬────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
│ Index │ │ Index │ │ Index │
│ Shard 0 │ │ Shard 1 │ │ Shard N │
│ (RAM+SSD) │ │ (RAM+SSD) │ │ (RAM+SSD) │
└─────┬─────┘ └─────┬─────┘ └─────┴─────┘
│ │ │
└──────────────┼──────────────┘
│
┌────────┴────────┐
│ Indexing Pipeline│
│ (Kafka + CDC) │
└────────┬────────┘
│
┌────────┴────────┐
│ Post Store │
│ (MySQL/Postgres)│
└─────────────────┘
Component 1: Tokenization Pipeline
When a post is created, it flows through a tokenization pipeline before entering the index.
Steps:
- Tokenize: Split on whitespace and punctuation. "I'm going to Taylor Swift's concert!" becomes
["i'm", "going", "to", "taylor", "swift's", "concert"]. - Normalize: Lowercase. Handle Unicode (accents, CJK characters).
"Taylor"->"taylor". - Stop word removal: Remove words that appear in >50% of all posts ("the", "is", "a", "to"). These words have near-zero IDF and just bloat posting lists.
- Stemming: Reduce to root form.
"running"->"run","concerts"->"concert". The Porter Stemmer is the classic algorithm. Elasticsearch uses it by default. - Emit: For each (term, post_id, position) triple, send to the indexing service.
A 50-word post after stop word removal yields ~30 terms. Each term generates one posting entry. At 1 billion posts/day, that is 30 billion posting entries per day, or ~350,000 per second. This is manageable with sharding.
Component 2: The Inverted Index (Posting Lists)
Each posting list entry stores more than just the post ID:
- post_id: Unique identifier for the post.
- frequency: How many times this term appears in this post. Used for BM25 scoring.
- positions: Where in the post the term appears (word offsets). Needed for phrase queries.
- static_rank: A pre-computed, query-independent quality score (based on author popularity, post engagement, etc.). Used for top-K retrieval without scanning the full list.
Why positions matter: For the query "Taylor Swift" (phrase search), after intersecting posting lists for "taylor" and "swift", you check: does any post have "taylor" at position P and "swift" at position P+1? Without positions, you cannot answer phrase queries.
Why static rank matters: Posting lists for common words contain billions of entries. You cannot scan them all. If the list is ordered by static rank (most important posts first), you can stop scanning after finding enough high-quality matches. This is how Facebook Unicorn avoids scanning entire posting lists -- the ordering is by static rank, not by post ID.
Component 3: Two-Tier Storage (RAM + SSD)
Facebook's actual architecture:
- Tier 0 (RAM): Hot data. Recent posts (last 7-30 days), high-engagement posts, frequently accessed posting lists. Size: ~50-100 TB across the cluster.
- Tier 1 (SSD): Warm data. Older posts, infrequently accessed posting lists. Size: ~600+ TB. SSD provides ~100 microsecond random reads, which is fast enough for search but 1000x slower than RAM.
Why not just add more RAM? Economics. At Facebook's scale, the RAM cost difference is tens of millions of dollars. SSD is the right trade-off: 100 microsecond reads are fine for the tail of a posting list that you rarely scan past the first few hundred entries.
The query path: When processing a query, the index server first checks RAM for the posting list. If the list head (high static-rank entries) is in RAM, it can often answer the query without touching SSD at all. Only if the query needs to scan deeper into the list (rare -- most queries are satisfied by the top few hundred entries) does it fall through to SSD.
Component 4: Document Partitioning (Sharding by Post ID)
There are two ways to shard an inverted index:
Option A: Term Partitioning (shard by keyword)
- Shard 0 holds all posting lists for terms A-F
- Shard 1 holds all posting lists for terms G-M
- etc.
- A single-term query hits exactly one shard. Fast reads.
- But: creating a single post requires updating posting lists across MANY shards (one per term in the post). A 50-word post touches 30+ shards. Write amplification is brutal.
Option B: Document Partitioning (shard by post ID)
- Shard 0 holds posts 1-1M and their complete local inverted index
- Shard 1 holds posts 1M-2M and their complete local inverted index
- etc.
- Creating a post requires updating only ONE shard. Fast writes.
- But: every search query must fan out to ALL shards (scatter-gather). Each shard returns its local top-K, and the query service merges them.
Facebook, Elasticsearch, Solr, MongoDB, and essentially every production search system uses document partitioning. The reason is simple: at 580K posting list updates per second, write amplification from term partitioning would be catastrophic. Scatter-gather for reads is acceptable because each shard's local query is fast, and the merge is cheap (just combine sorted lists).
The scatter-gather trade-off: with 100 shards, every query hits 100 servers. The p99 latency of the query is determined by the slowest shard. This is tail latency amplification -- with 100 shards and a 1% chance of any shard being slow, there is a 63% chance at least one shard is slow. Mitigation: send the query to replicas in parallel and take the first response (hedged requests).
Component 5: Query Processing
Step 1: Parse and rewrite
The query "Taylor Swift concert 2024" is parsed into:
If the user wraps in quotes: "Taylor Swift" concert becomes:
Query rewriting may add clauses: if the user is in New York, add a soft preference for posts geotagged in New York.
Step 2: Fan out to shards
The query is sent to all index shards in parallel. Each shard:
- Looks up posting lists for each term
- Intersects them (two-pointer merge, starting with the shortest list)
- For phrase queries, checks position adjacency
- Scores the top-K results using BM25 + social signals
- Returns the local top-K (typically K=100 or K=500) to the query service
Step 3: Merge and re-rank
The query service collects results from all shards, merges them into a single sorted list, applies global re-ranking (cross-shard signals, diversity, privacy filtering), and returns the top 20 to the client.
Component 6: Scoring with BM25 + Social Signals
BM25 is the industry standard for text relevance scoring. It improves over naive TF-IDF in two ways:
-
Term frequency saturation: The 100th occurrence of "taylor" in a post does not make it 100x more relevant than a post with 1 occurrence. BM25 applies a saturation curve controlled by parameter k1 (default 1.2).
-
Document length normalization: A 10,000-word post mentioning "taylor" 5 times is less focused than a 20-word post mentioning it 5 times. Parameter b (default 0.75) controls how much to penalize longer documents.
BM25 formula for a query Q with terms q1, ..., qn and document D:
score(D, Q) = SUM over qi of:
IDF(qi) * [f(qi, D) * (k1 + 1)] / [f(qi, D) + k1 * (1 - b + b * |D| / avgdl)]
Where:
- f(qi, D) = frequency of term qi in document D
- |D| = length of document D in tokens
- avgdl = average document length across all posts
- IDF(qi) = log((N - df(qi) + 0.5) / (df(qi) + 0.5) + 1)
Social signal overlay: BM25 gives you text relevance. For social search, you add:
final_score = w1 * bm25_score
+ w2 * social_closeness(searcher, author)
+ w3 * engagement_score(likes, shares, comments)
+ w4 * freshness_decay(post_age)
Where social_closeness is the graph distance between the searcher and the post author (friend = 1.0, friend-of-friend = 0.3, stranger = 0.0). This is the signal that makes social search fundamentally different from web search.
Facebook Unicorn's approach: scoring uses "a few hundred well-engineered features combined in a linear model." They deliberately limit feature count and invest in feature engineering rather than model complexity.
Component 7: Real-Time Indexing via CDC
Posts must be searchable within 10 seconds of creation. The pipeline:
- Post is written to the primary data store (MySQL at Facebook).
- A CDC (Change Data Capture) system -- Facebook uses Wormhole, you might use Debezium -- captures the write from the MySQL binlog.
- The change event is published to Kafka.
- An indexing consumer reads from Kafka, tokenizes the post, and updates the in-memory index on the appropriate shard.
- The post is now searchable.
Why CDC over application-level dual writes? Dual writes (application writes to both DB and index) are fragile. If the DB write succeeds but the index write fails, the post exists but is not searchable. If the index write succeeds but the DB write fails, search returns a phantom post. CDC guarantees that the index follows the DB -- the DB is the source of truth.
Facebook's numbers: Wormhole processes 50 million messages per second at 35 GB/sec steady state. ~4.3 trillion messages per day. This is the backbone that keeps the search index within seconds of the primary data store.
Component 8: Privacy Filtering
Facebook Unicorn does NOT store privacy information in its index. This is a deliberate architectural decision.
Why not filter at query time? Privacy rules are complex and change frequently. A post might be visible to friends, friends-of-friends, members of a group, or custom lists. Encoding all of this into the index would create massive write amplification (every privacy change would require re-indexing) and couple the search system to the privacy system.
The solution: Unicorn returns results with lineage metadata (how the result was found). The PHP frontend makes the actual privacy check for each result before displaying it to the user. If Unicorn returns 500 results and 400 are filtered by privacy, the user sees 100. This means the search system may do wasted work, but the privacy checks remain centralized, correct, and maintainable.
Trade-off: You over-fetch from the index (request more results than needed, expecting some to be filtered). If the privacy filter rate is high (e.g., 80% of results are filtered), you need to request 5x more results from the index than you show. This increases index server load but keeps the architecture clean.
Deep Dives

Deep Dive 1: Skip Pointers for Posting List Intersection
The basic two-pointer intersection is O(m + n). For posting lists with millions of entries, this is slow. Skip pointers are the standard optimization.
Concept: Augment the posting list with "shortcuts" that jump ahead. Every sqrt(L) entries, insert a pointer to the entry sqrt(L) positions ahead.
Posting list for "taylor":
[3, 8, 15, 22, 41, 55, 67, 80, 93, 107, 120, ...]
^skip ^skip ^skip
jumps to 22 jumps to 80 jumps to 120
During intersection: When scanning list B for a match with the current element of list A, check if the skip pointer's target is still less than or equal to list A's current element. If so, take the skip -- you just avoided scanning several entries.
Impact: Intersection cost drops from O(m + n) to O(sqrt(m * n)) in practice, which is a significant speedup when one list is much longer than the other.
Lucene's implementation: Lucene uses multi-level skip lists (like a skip list data structure, not just single-level skip pointers). This gives O(log N) skip cost even for very long posting lists.
When to discuss this: When the interviewer asks "how do you make intersection fast?" or when you are explaining why your system can handle posting lists with billions of entries. Skip pointers are what make the two-pointer merge practical at scale.
Deep Dive 2: Document Partitioning vs. Term Partitioning Trade-offs
This is the sharding decision that drives everything else. Let me make the trade-off explicit with numbers.
Scenario: 100 shards, 1B posts/day (11,600 posts/sec), 50 tokens per post, 11,500 search queries/sec.
Term Partitioning (shard by keyword hash):
| Metric | Value |
|---|---|
| Write fan-out per post | ~30 shards (one per unique token after stop words) |
| Total write operations/sec | 11,600 * 30 = 348,000 cross-shard writes |
| Read fan-out per single-term query | 1 shard |
| Read fan-out per 3-term AND query | 3 shards (one per term) |
| Advantage | Reads hit fewer shards |
| Disadvantage | Writes hit many shards; cross-shard transaction complexity |
Document Partitioning (shard by post ID):
| Metric | Value |
|---|---|
| Write fan-out per post | 1 shard (all tokens indexed locally) |
| Total write operations/sec | 11,600 single-shard writes |
| Read fan-out per query | 100 shards (every query hits all shards) |
| Advantage | Writes are local and fast; each shard has a complete local index |
| Disadvantage | Reads hit all shards (scatter-gather); tail latency amplification |
Why document partitioning wins at Facebook scale:
- Write volume dominates: 580K posting list updates/sec would require distributed coordination across 30+ shards per post with term partitioning. Document partitioning makes each write local.
- BM25 scoring works locally: each shard can compute local TF-IDF/BM25 because it has a representative sample of documents. Term partitioning would require cross-shard coordination for IDF computation.
- Scatter-gather is fine: 100 parallel queries, each completing in ~10ms, merged in ~1ms = total ~15ms. Tail latency is managed with hedged requests and timeouts.
When the interviewer objects: "But scatter-gather is expensive!" Yes, it is. The mitigation is hedged requests (send the query to both the primary and a replica, take whichever responds first). Google's Jeff Dean showed that hedged requests reduce p99 latency by 4x in practice.
Deep Dive 3: Posting List Compression
At 700 TB of index data, compression is not optional. It is the difference between 10,000 servers and 3,000 servers.
Delta encoding: Instead of storing absolute post IDs [3, 8, 15, 22, 41], store deltas [3, 5, 7, 7, 19]. Deltas are smaller numbers that require fewer bits.
Variable-byte encoding (VByte): Use 1 byte for values 0-127, 2 bytes for 128-16383, etc. Each byte uses 7 bits for data and 1 bit to indicate "more bytes follow." This is the simplest compression scheme and reduces posting list size by ~50%.
Lucene's FOR (Frame of Reference) encoding: Split the posting list into blocks of 256 entries. Delta-encode within each block. Find the maximum delta in the block. Pack all deltas using that many bits. This achieves 2-4 bits per entry in practice, compared to 32 or 64 bits uncompressed.
Impact at Facebook scale: If the average posting entry is 20 bytes uncompressed and compresses to 5 bytes, the 700 TB index becomes 175 TB. That is the difference between needing SSD storage for 700 TB and needing SSD storage for 175 TB -- roughly $50K/year saved.
Roaring Bitmaps: For boolean filters (e.g., "posts with photos"), Lucene uses Roaring Bitmaps. These are compressed bitmaps that efficiently represent sets of document IDs and support fast AND/OR/NOT operations. A Roaring Bitmap can represent a set of 1 billion document IDs in as little as 100 MB if the IDs are clustered.
Alternative Designs
Alternative 1: Redis-Based Index
Use Redis sorted sets where each key is a term and each member is a post ID with a score (recency or engagement).
Pros: Dead simple. Redis sorted sets support range queries, intersection via ZINTERSTORE, and top-K via ZREVRANGE. Near-zero latency for small posting lists.
Cons: Everything in memory. At 700 TB, you need ~9,000 Redis instances with 80 GB each. Redis ZINTERSTORE creates a new key for each intersection, consuming memory and requiring cleanup. No phrase search support. No BM25 scoring. No compression.
When to use: Prototypes, small-scale search (< 10M documents). Not for production social search at Facebook scale.
Alternative 2: Elasticsearch
If the interviewer lifts the "no ES" restriction: Elasticsearch does everything described above. It uses Lucene segments (immutable mini-indexes), BM25 scoring, document partitioning with scatter-gather, skip lists, FOR compression, and FST-based term dictionaries. The near-real-time search mechanism writes to a filesystem cache segment every 1 second, making new documents searchable within ~1 second.
When to use: When the interviewer says "assume you can use any off-the-shelf technology." ES is the right answer for 99% of search problems. The point of this question is to make you explain what ES does internally.
Alternative 3: Term Partitioning for Read-Heavy Workloads
If the system is overwhelmingly read-heavy and writes are infrequent (e.g., a knowledge base, not social media), term partitioning can make sense. Each query term hits exactly one shard, reducing fan-out from N shards to T shards (where T is the number of query terms). The write amplification is acceptable when writes are rare.
Scaling Math
Index Server Sizing
Total index size: 700 TB
Target per server: 2 TB SSD + 128 GB RAM (hot cache)
Servers for index: 700 / 2 = 350 servers (before replication)
Replication factor: 3 (for availability + hedged requests)
Total index servers: 350 * 3 = 1,050 servers
Query fan-out: 350 shards
Queries per second: 11,500 average, 35,000 peak
Queries per shard/sec: 35,000 / 3 replicas = ~12,000 peak (each replica serves a fraction)
Per-shard query latency: ~5-10 ms (RAM hit), ~20-50 ms (SSD hit)
Merge + re-rank: ~5 ms
Total e2e latency: ~25-60 ms p50, ~100-200 ms p99
Indexing Throughput
New posts/sec: 11,600
Tokens per post: 50 avg, 30 after stop word removal
Posting list updates: 11,600 * 30 = 348,000/sec
Per-shard updates: 348,000 / 350 = ~1,000/sec per shard
Each update: ~100 bytes (term + post_id + freq + positions)
Per-shard write B/W: 100 KB/sec (trivial)
Kafka Pipeline
Post events/sec: 11,600
Event size: ~2 KB (post text + metadata)
Kafka throughput: ~23 MB/sec (a single broker handles this easily)
Partitions: 50-100 (one per indexing consumer group)
Consumer lag target: < 5 seconds
Failure Analysis
| Failure | Impact | Mitigation |
|---|---|---|
| Index shard goes down | 1/350 of the index is temporarily unavailable. Queries to that shard fail. | 3x replication. Query service retries on replica. Hedged requests hit multiple replicas simultaneously. |
| Kafka consumer falls behind | New posts are not searchable for minutes instead of seconds. | Monitor consumer lag. Auto-scale consumers. Alert on lag > 30 seconds. Kafka retains events for 72 hours -- consumer catches up after recovery. |
| Wormhole/CDC pipeline fails | Index diverges from primary data store. Stale or missing search results. | Periodic full index rebuild via batch MapReduce (daily or weekly) corrects drift. Monitoring compares index freshness against DB timestamps. |
| Network partition between query service and shards | Partial results returned (some shards unreachable). | Return partial results with a "results may be incomplete" flag. Users tolerate slightly degraded search better than total failure. |
| Hot posting list (celebrity posts) | A posting list for a celebrity name has billions of entries. Scanning it is slow. | Static rank ordering means you scan the top of the list (highest quality posts) and stop. Never scan the full list. Cap scan depth at 10,000 entries. |
| Memory pressure on index servers | SSD fallback increases latency. | Monitor RAM cache hit rate. If it drops below 90%, either add RAM or add more shards (spreading data across more servers reduces per-server memory pressure). |
| Privacy service outage | Cannot filter results for privacy. Showing private posts to unauthorized users is unacceptable. | Fail closed: if privacy service is unavailable, return no search results rather than unfiltered results. Users see "Search temporarily unavailable." |
Level Expectations
| Level | What the Interviewer Expects |
|---|---|
| Mid (L4) | Explain inverted index basics. Build a HashMap-based index. Show two-pointer intersection. Mention sharding. Sort by recency. This is a passing answer. |
| Senior (L5) | Everything above plus: BM25 scoring (not just recency). Document partitioning with scatter-gather. Real-time indexing via Kafka/CDC. Two-tier storage (RAM + SSD). Posting list with frequency and positions for phrase queries. Social signal integration in scoring. |
| Staff (L6) | Everything above plus: Skip pointers for intersection. Posting list compression (delta encoding, VByte, FOR). Static rank ordering to avoid full scans. Privacy filtering at render time vs. query time trade-off. Tail latency mitigation with hedged requests. Reference to Facebook Unicorn's multi-hop query support and weak-AND semantics. Explicit discussion of why document partitioning beats term partitioning. Quantified scaling math. |
References from Our Courses
- Inverted Index Internals — how Elasticsearch tokenizes and indexes documents
- Relevance Scoring — TF-IDF and BM25 ranking for social content
- Sharding and Scaling — distributed search across index shards
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.