Design an Online Code Judge
TL;DR
Build LeetCode -- a system where users submit code, the system compiles and runs it against test cases, and returns pass/fail results with execution time and memory usage. Sandboxing is THE challenge. User-submitted code is hostile by default: it can attempt to read /etc/passwd, fork bomb the system, allocate 100 GB of RAM, open network connections, or exploit kernel vulnerabilities to escape the container. gVisor (Google's user-space kernel) and Firecracker (AWS's microVM) are the two production-grade solutions. Compilation is a separate dangerous phase -- C++ templates can generate gigabytes of code from a few lines. Codeforces' pretests/systests pattern (test against a subset during contest, full suite after) is a clever trade-off between real-time feedback and compute cost. The system must handle 10K concurrent submissions during a contest while guaranteeing that execution time is deterministic and reproducible.
The System
LeetCode. A user writes a Python solution to "Two Sum," clicks Submit, and within 5 seconds sees: "Accepted. Runtime: 45 ms (beats 92%). Memory: 14.2 MB (beats 78%)." Behind that simple UX, the system has: received the source code, selected the correct language runtime, compiled it (for compiled languages), created a sandboxed execution environment, run the code against 50+ test cases with strict time and memory limits, compared the output against expected results, measured execution time and memory consumption, and returned the verdict.
Why is this a system design problem and not just "run code in Docker"? Because running untrusted code is one of the most dangerous things a server can do. The user's code is adversarial. It can attempt system calls that crash the host kernel. It can try to escape the sandbox and access other users' submissions. It can consume all available CPU, memory, or disk, denying service to other users. Every competitive programming platform has war stories about contestants exploiting the judge. Codeforces has been taken down by deliberately crafted submissions. HackerRank had a container escape vulnerability. The sandbox is not a nice-to-have; it is the entire architecture.
Requirements
Functional Requirements
| Requirement | Details |
|---|---|
| Code submission | Accept source code in 10+ languages (Python, Java, C++, Go, Rust, etc.). |
| Compilation | Compile submitted code (for compiled languages) with configurable flags. |
| Test execution | Run compiled code against test cases with stdin/stdout matching. |
| Verdicts | Accepted, Wrong Answer, Time Limit Exceeded (TLE), Memory Limit Exceeded (MLE), Runtime Error (RE), Compilation Error (CE). |
| Resource measurement | Report execution time (ms) and memory usage (MB). |
| Custom test cases | Users can run code against their own input before official submission. |
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Submission-to-verdict latency | < 10 seconds for most problems. < 30 seconds for hard problems with many test cases. |
| Sandboxing | Complete isolation. No access to host filesystem, network, or other submissions. |
| Deterministic timing | Same code, same test case = same execution time (within 5% variance). |
| Availability | 99.9% (higher during contests: 99.95%) |
| Scale | 10K concurrent submissions during contests, 1K during normal hours |
Back-of-Envelope Math
Normal hours:
Submissions/hour: ~50,000
Submissions/sec: ~14
Avg test cases per problem: 50
Executions/sec: 14 * 50 = 700
Contest (2-hour window, 10K participants):
Submissions/hour: ~100,000 (10K users * 5 submissions/problem * 4 problems / 2 hours)
Submissions/sec: ~28 avg, ~200 peak (end-of-contest rush)
Executions/sec: 200 * 50 = 10,000
Execution resources per submission:
Time limit: 2 seconds per test case
Memory limit: 256 MB per test case
Worst-case wall time: 50 test cases * 2 sec = 100 seconds (but parallelized)
Actual wall time: 50 test cases / 5 parallel workers = 20 seconds max
Worker requirements (contest peak):
Concurrent executions: 200 submissions/sec * 20 sec = 4,000 concurrent
Each execution: 1 CPU core + 256 MB RAM
CPU cores needed: 4,000
RAM needed: 4,000 * 256 MB = 1 TB
Compilation:
C++ compilation time: 2-10 seconds per submission
Java compilation: 1-3 seconds
Python: no compilation (interpreted)
Compilation workers: 200/sec * 5 sec avg = 1,000 concurrent
The number that matters: 4,000 concurrent execution environments, each isolated from the others, each limited to 2 seconds of CPU and 256 MB of RAM. This is a massive sandboxing challenge.
Naive Design
Docker container per submission.
Flow:
1. Receive submission (language, code, problem_id).
2. docker run --rm -v code:/submit language-image
compile /submit/solution.cpp -o /submit/solution
3. For each test case:
docker run --rm --memory=256m --cpus=1
timeout 2 /submit/solution < test_input.txt > output.txt
4. diff output.txt expected_output.txt
5. Report verdict.
This actually works for a prototype. Docker provides filesystem isolation, memory limits (--memory), and CPU limits (--cpus). What could go wrong?
Where It Breaks
Problem 1: Docker Is Not a Security Boundary
Docker containers share the host kernel. A malicious submission can exploit kernel vulnerabilities to escape the container. CVE-2019-5736 allowed a container process to overwrite the host runc binary and gain root access on the host. CVE-2020-15257 allowed container escape via the containerd API. Docker is designed for packaging and deployment isolation, not security isolation of hostile code.
Problem 2: Container Startup Is Too Slow
docker run takes 500-1000ms to start a container (image pull, filesystem setup, namespace creation). For 10,000 executions/sec during a contest, you need 10,000 container starts/sec. Docker cannot handle this. You would need to pre-warm containers, but then you have 10,000 idle containers consuming memory.
Problem 3: Fork Bombs
A user submits:
This creates processes exponentially. Even with memory limits, each process consumes a PID and kernel memory. With 65,536 PIDs available, the fork bomb exhausts the PID space in under a second, taking down the host (no new processes can be created for any user).
Problem 4: Compilation as Attack Vector
C++ templates are Turing-complete. A user can submit:
template<int N> struct Bomb { enum { value = Bomb<N-1>::value + Bomb<N-2>::value }; };
template<> struct Bomb<0> { enum { value = 0 }; };
template<> struct Bomb<1> { enum { value = 1 }; };
int main() { return Bomb<100>::value; }
This is a relatively mild example. More extreme template metaprogramming can force the compiler to generate gigabytes of intermediate code, exhaust RAM, and crash the compilation server. The compilation phase is as dangerous as the execution phase and needs its own resource limits.
Problem 5: Non-Deterministic Timing
Docker containers on a shared host experience CPU scheduling jitter. A solution that runs in 1.95 seconds on a quiet host might take 2.05 seconds on a busy host and get TLE. Users complain: "My solution is correct but the judge is too slow!" Timing must be deterministic.
Real Design

Architecture Overview
┌──────────────┐
│ Web API │ ── receives submissions, returns verdicts
└──────┬───────┘
│
┌──────┴───────┐
│ Submission │ ── queues, deduplicates, prioritizes
│ Queue │
│ (Redis + │
│ Kafka) │
└──────┬───────┘
│
┌──────┴───────────────────────────────────┐
│ Judge Workers (pool of N machines) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Sandbox Layer │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ gVisor │ │ gVisor │ ... │ │
│ │ │ sandbox │ │ sandbox │ │ │
│ │ │ (sub 1) │ │ (sub 2) │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ │ OR │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │Firecracker│ │Firecracker│ ... │ │
│ │ │ microVM │ │ microVM │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Resource Controller │ │
│ │ - cgroups v2 (CPU, memory, PIDs) │ │
│ │ - seccomp (syscall filtering) │ │
│ │ - time measurement (CPU clock) │ │
│ └─────────────────────────────────────┘ │
└───────────────────────────────────────────┘
Component 1: Sandboxing -- gVisor vs. Firecracker
These are the two production-grade options for running untrusted code.
gVisor (Google):
gVisor implements a user-space kernel called Sentry. When the user's code makes a system call (e.g., open(), read(), write()), it does NOT reach the host Linux kernel. Instead, gVisor intercepts the syscall and implements it in user space.
Normal Docker: user code -> syscall -> host kernel -> hardware
gVisor: user code -> syscall -> gVisor Sentry (user space) -> limited host syscalls -> host kernel
Why is this safer? The host kernel is a massive attack surface (~30 million lines of code, hundreds of syscalls). gVisor exposes only ~200 syscalls to user code (vs. 300+ in Linux) and implements them in a memory-safe language (Go). A kernel exploit that works against Linux does not work against gVisor because gVisor is not Linux.
Performance cost: gVisor adds 5-30% overhead for CPU-bound workloads and 2-10x overhead for I/O-heavy workloads (because every I/O syscall passes through the Sentry). For a code judge, the workload is mostly CPU-bound (algorithms), so 5-15% overhead is typical.
Firecracker (AWS):
Firecracker runs each submission in a lightweight virtual machine (microVM). Each microVM has its own kernel. A kernel exploit inside the VM cannot escape to the host because the VM is hardware-isolated (Intel VT-x / AMD-V).
Startup time: Firecracker boots a microVM in ~125 ms (vs. ~500 ms for Docker). It can launch 150 microVMs per second per host. With pre-booted VM pools, startup is near-instant (hand a pre-warmed VM to the submission).
Performance cost: Near-native. Hardware virtualization adds < 5% overhead for CPU-bound workloads.
Trade-off:
| Property | gVisor | Firecracker |
|---|---|---|
| Isolation level | User-space kernel (process-level) | Hardware VM (kernel-level) |
| Startup time | ~50 ms (container + gVisor runtime) | ~125 ms (microVM boot) |
| CPU overhead | 5-15% | < 5% |
| I/O overhead | 2-10x | Near-native |
| Security boundary | Syscall filtering | Hardware virtualization |
| Used by | Google Cloud Run, GKE Sandbox | AWS Lambda, Kata Containers |
Recommendation for a code judge: Firecracker for maximum security (hardware isolation is harder to escape than user-space syscall filtering). gVisor if you need faster startup and can accept slightly weaker isolation.
Component 2: Resource Limiting with cgroups v2
Even inside a sandbox, you must limit resources. cgroups v2 is the Linux mechanism for this.
CPU limit:
# Limit to 1 CPU core (100,000 microseconds per 100,000 microsecond period)
echo "100000 100000" > /sys/fs/cgroup/submission_1/cpu.max
Memory limit:
# Limit to 256 MB
echo $((256 * 1024 * 1024)) > /sys/fs/cgroup/submission_1/memory.max
# Disable swap to prevent eviction to disk (which would slow down, not kill, the process)
echo 0 > /sys/fs/cgroup/submission_1/memory.swap.max
PID limit (fork bomb prevention):
# Maximum 50 processes (main process + threads + children)
echo 50 > /sys/fs/cgroup/submission_1/pids.max
This is the fork bomb defense. pids.max = 50 means the fork bomb creates 49 children and then fork() returns EAGAIN. The bomb is contained. The host PID space is unaffected.
Disk I/O limit:
# Limit to 10 MB/s read, 5 MB/s write
echo "8:0 rbps=10485760 wbps=5242880" > /sys/fs/cgroup/submission_1/io.max
Component 3: Compilation as a Separate Phase
Compilation is dangerous and must be sandboxed separately from execution.
Why separate?
- Different resource profile: Compilation of C++ needs 2-4 GB RAM and 10-30 seconds. Execution needs 256 MB and 2 seconds. Using execution limits for compilation would OOM the compiler.
- Different attack surface: Compiler exploits (malicious includes, template bombs) are distinct from runtime exploits (syscall attacks, fork bombs).
- Caching: If two users submit identical code, compile once and reuse the binary. Compilation is expensive; execution is cheap.
Compilation sandbox:
Resource limits:
CPU time: 30 seconds
Memory: 2 GB
PIDs: 100 (compiler spawns child processes for linking)
Disk write: 100 MB (compiled binary + temp files)
Network: NONE (no outbound connections during compilation)
Filesystem isolation: The compilation sandbox has access to:
/usr/bin/g++,/usr/bin/javac(compiler binaries, read-only)/usr/include,/usr/lib(standard libraries, read-only)/submit/(user's source code, read-only; output binary, write)- Nothing else. No
/etc/passwd, no/proc, no/dev.
Compilation cache: Hash the source code and language/flags. If the hash exists in cache, skip compilation and use the cached binary. Cache hit rate during contests: ~20% (many contestants submit the same boilerplate with small changes, but the hash includes the entire source, so even one character difference misses).
Component 4: Test Case Execution Pipeline
After compilation, the binary runs against test cases.
Sequential execution (simpler, used by most judges):
for test_case in test_cases:
result = run_in_sandbox(binary, test_case.input, time_limit, memory_limit)
if result.status != ACCEPTED:
return result // Short-circuit on first failure
return ACCEPTED
Short-circuit: Most judges stop on the first failed test case. If test case 3 fails with Wrong Answer, they do not run test cases 4-50. This saves compute. LeetCode shows which test case failed.
Parallel execution (used during contests for speed):
Run 5-10 test cases in parallel, each in its own sandbox. If any fails, cancel the rest.
Parallel execution math:
50 test cases, 5 parallel workers
Each test case: 2 sec max
Sequential: 50 * 2 = 100 sec worst case
Parallel (5 workers): 10 rounds * 2 sec = 20 sec worst case
With short-circuit: typically 2-6 sec (fails early on wrong submissions)
Component 5: Deterministic Timing
Users complain when their solution passes on their machine but gets TLE on the judge. Timing must be reproducible.
Problem: CPU scheduling on a shared host introduces jitter. Two runs of the same code can differ by 20-50%.
Solution 1: CPU pinning.
Pin each execution to a specific CPU core using taskset. No other processes run on that core. Eliminates scheduling jitter from other workloads.
Solution 2: Measure CPU time, not wall time.
Use getrusage() to measure user CPU time (time spent executing user code) instead of wall clock time (which includes time waiting for I/O and scheduling). CPU time is deterministic regardless of host load.
struct rusage usage;
getrusage(RUSAGE_CHILDREN, &usage);
cpu_time_ms = usage.ru_utime.tv_sec * 1000 + usage.ru_utime.tv_usec / 1000;
Solution 3: Dedicated judge machines.
Run judge workers on dedicated machines that do not host any other workload. This is what Codeforces does during contests -- they provision bare-metal servers dedicated to judging.
LeetCode's approach: LeetCode normalizes execution times using a reference benchmark. They run a calibration program on each judge machine and compute a "speed factor." User execution times are divided by the speed factor to produce a normalized time. This allows different machine types in the judge pool.
Component 6: Codeforces Pretests/Systests Pattern
During a Codeforces contest, submissions are not judged against the full test suite.
Pretests (during contest):
- A small subset of test cases (5-15) designed to catch common errors.
- Fast judging: < 5 seconds per submission.
- Purpose: give contestants immediate feedback.
- NOT definitive: passing pretests does not guarantee correctness.
Systests (after contest ends):
- Full test suite (50-200 test cases) including edge cases, stress tests, and anti-hack tests.
- Takes 10-30 minutes to judge all submissions.
- This is when the real verdict is determined. Many submissions that passed pretests fail systests.
Why this pattern?
During a 2-hour contest with 10K participants, each submitting 5 times per problem across 5 problems = 250K submissions. Judging each against 200 test cases = 50M executions. At 2 seconds each = 100M seconds of CPU time = 27,778 CPU-hours. In 2 hours, you need 13,889 CPU cores. With pretests (15 test cases), you need 15/200 * 13,889 = 1,042 cores. A 13x reduction in peak compute.
System design implications: The judge system has two modes:
- Contest mode: Use pretests. Optimize for latency (< 5 seconds per verdict). Queue priority: contest submissions first.
- Post-contest mode: Run systests. Optimize for throughput. Process in batch. Priority: none (batch processing).
Deep Dives

Deep Dive 1: Syscall Filtering with seccomp
Beyond gVisor/Firecracker, an additional defense layer is seccomp (Secure Computing Mode), which filters system calls at the kernel level.
seccomp-bpf: A BPF (Berkeley Packet Filter) program that runs before every syscall. It can ALLOW, DENY, or KILL the process based on the syscall number and arguments.
Allowlist for a code judge:
ALLOWED syscalls:
read, write, exit, exit_group -- basic I/O
brk, mmap, munmap, mprotect -- memory management
clock_gettime -- timing
futex -- threading
DENIED syscalls (kill process if attempted):
execve -- no spawning new processes (prevents shell escapes)
fork, clone -- no forking (fork bomb prevention at syscall level)
socket, connect, bind -- no networking
open, openat (with write flags) -- no writing to filesystem
ptrace -- no debugging other processes
mount -- no mounting filesystems
reboot -- obviously not
Defense in depth: seccomp is the last line of defense after cgroups, namespaces, and gVisor/Firecracker. Even if the sandbox is compromised, seccomp prevents the most dangerous syscalls from reaching the kernel.
Deep Dive 2: Language-Specific Challenges
Each programming language has unique sandboxing challenges.
Python:
import os; os.system("rm -rf /")-- must blockos.system,subprocess,ctypes.- Solution: seccomp blocks
execve. Theos.systemcall fails with EPERM. - BUT: Python's
ctypesmodule can call libc functions directly, bypassing Python-level restrictions. seccomp at the kernel level is the only reliable defense.
Java:
- The JVM itself is a large runtime. It needs 100+ MB of RAM just to start.
- JVM startup time: 500-2000 ms. For a 2-second time limit, this is a significant fraction.
- Solution: pre-warm JVMs. Keep a pool of started JVMs waiting for submissions. Load user code via classloader.
- Java's SecurityManager (deprecated in JDK 17) was historically used for sandboxing. With its deprecation, external sandboxing (gVisor, seccomp) is essential.
C/C++:
- Inline assembly allows direct syscall invocation, bypassing libc.
syscall(SYS_socket, ...)can create a network socket even ifsocket()is restricted at the library level. - Solution: seccomp at the kernel level catches ALL syscalls, regardless of how they are invoked.
#include </etc/shadow>-- the preprocessor reads arbitrary files during compilation. The compilation sandbox must restrict filesystem access.
Go, Rust:
- Statically linked binaries. No dependency on shared libraries. Clean to sandbox.
- Go's goroutines use threads internally, so
pids.maxmust be generous (200+) for Go programs.
Deep Dive 3: Submission Queue and Priority
During a contest, the judge must balance fairness, latency, and throughput.
Priority scheme:
Priority 1 (highest): Contest submissions from users who have not received any verdict yet.
These users are blocked, waiting for feedback.
Priority 2: Contest submissions from users who have at least one verdict.
They can work on other problems while waiting.
Priority 3: Practice submissions (non-contest).
Priority 4: Custom test case runs (lowest priority).
Queue implementation: Redis sorted set where the score is (priority * 10^12) + submission_timestamp. Lower score = higher priority. Workers ZPOPMIN from the sorted set to get the next submission to judge.
Rate limiting: During contests, limit each user to 1 pending submission per problem. If they have a submission being judged, they cannot submit again until the verdict is returned. This prevents a single user from flooding the queue.
Rejudging: If a test case is found to be wrong (incorrect expected output), the system must rejudge all submissions for that problem. This is a batch operation that runs at Priority 3, after current contest submissions.
Alternative Designs
| Approach | Pros | Cons | When to Use |
|---|---|---|---|
| gVisor + cgroups + seccomp (described above) | Strong isolation. Fast startup. Production-proven at Google. | gVisor overhead (5-15%). Complex setup. | LeetCode, HackerRank, production judges. |
| Firecracker microVMs | Hardware-level isolation. Near-native performance. | Higher startup time (125 ms). More memory overhead per VM. | Maximum security environments. AWS Lambda uses this. |
| Docker with seccomp profile | Simple. Familiar tooling. Adequate for low-risk environments. | Docker is NOT a security boundary. Container escape CVEs exist. | Internal tools, low-stakes coding challenges, prototype judges. |
| Remote code execution API (Judge0, Sphere Engine) | Fully managed. Zero infrastructure. API-based. | Latency (network round trip). Cost per execution. Rate limits. Less control. | Startups, hackathon projects, when judging is not core product. |
| WebAssembly (Wasm) sandbox | Memory-safe by design. No syscall access. Fast startup (< 1 ms). | Limited language support (C/C++/Rust compile to Wasm natively, but Java/Python do not). No filesystem or network access. | Browser-based judges. When you need the strongest sandbox with the narrowest language support. |
Scaling Math Verification
Judge Worker Pool
Contest peak: 200 submissions/sec
Test cases per submission: 15 (pretests during contest)
Executions/sec: 200 * 15 = 3,000
Execution time per test: 2 sec max, 0.5 sec avg
Concurrent executions: 3,000 * 0.5 = 1,500 concurrent
CPU cores per execution: 1
Workers needed: 1,500 cores / 16 cores per machine = 94 machines
With Firecracker:
Memory per VM: 256 MB (submission) + 100 MB (VM overhead) = 356 MB
VMs per machine (64 GB): 64,000 / 356 = ~180 concurrent VMs
But CPU-limited to 16 -> 16 concurrent VMs per machine
Machines needed: 1,500 / 16 = 94 machines (CPU-bound, not memory-bound)
Systests (Post-Contest)
Total submissions to systest: 250,000 (10K users * 5 subs/problem * 5 problems)
Test cases per submission: 200 (full suite)
Total executions: 250K * 200 = 50 million
Execution time: 0.5 sec avg
Total CPU time: 25M seconds = 6,944 CPU-hours
Time budget: 1 hour (users want results quickly)
Cores needed: 6,944 cores
Machines (16 cores): 434 machines
Cloud cost: 434 * $0.10/hr * 1 hr = $43.40 per contest systest
Storage
Submission source code: ~5 KB average
Compiled binary: ~1 MB average
Test case data: ~50 KB per test case * 1,000 problems = 50 MB
Submission metadata: ~500 bytes per submission
Daily submissions: ~100K
Daily storage: 100K * (5 KB + 1 MB + 500 B) = ~100 GB
Annual: 36 TB
Retention policy: Keep source code forever. Delete compiled binaries after 30 days.
Annual after cleanup: ~2 TB
Failure Analysis
| Failure | Impact | Mitigation |
|---|---|---|
| Sandbox escape | Attacker gains access to host. Can read other submissions, modify judge results, or pivot to other systems. | Defense in depth: gVisor/Firecracker + seccomp + cgroups + namespace isolation. Regular security audits. Bug bounty program. Immediate container/VM termination on anomaly. |
| Fork bomb exhausts host PIDs | Judge host cannot create new processes. All submissions on that host fail. | pids.max in cgroups limits per-submission PID count. seccomp blocks fork() entirely for single-threaded problems. |
| Compilation bomb (template explosion) | Compiler OOMs. Compilation worker crashes. | Memory and time limits on compilation (2 GB, 30 seconds). Kill compiler process on limit exceeded. Return CE verdict. |
| Judge worker crashes mid-execution | Submission gets no verdict. User waits indefinitely. | Timeout on the submission queue. If no verdict within 60 seconds, mark as "judge error" and requeue on a different worker. |
| Non-deterministic timing | Correct solution gets TLE on one run but passes on another. User complains. | CPU pinning + CPU time measurement (not wall time). Dedicated judge machines. Timing calibration factor. Allow 2 retries for borderline TLE. |
| Test case is wrong | Correct submissions marked as Wrong Answer. | Manual test case validation before contests. Allow contestants to challenge test cases (Codeforces "hack" system). Rejudge affected submissions. |
| Queue flooding during contest | One user submits 100 times per minute. Queue backs up for everyone. | Rate limit: 1 pending submission per user per problem. Queue priority: first-time submissions before retries. |
Level Expectations
| Level | What the Interviewer Expects |
|---|---|
| Mid (L4) | Docker container per submission. Time and memory limits via Docker flags. Sequential test case execution. Basic pass/fail verdict. Knows sandboxing is important but cannot articulate specific threats. |
| Senior (L5) | gVisor or Firecracker for sandbox isolation (not just Docker). cgroups v2 for resource limits including pids.max for fork bomb prevention. Separate compilation and execution phases with different resource limits. seccomp for syscall filtering. Queue with priority during contests. Deterministic timing via CPU time measurement. |
| Staff+ (L6) | Specific CVEs for container escape (CVE-2019-5736) and why Docker alone is insufficient. Compilation as an attack vector (template bombs, preprocessor file inclusion). Codeforces pretests/systests pattern with quantified compute savings. Language-specific sandboxing challenges (Python ctypes, Java SecurityManager deprecation, C inline assembly). Timing calibration across heterogeneous judge machines. WebAssembly as an alternative sandbox with trade-off analysis. |
References from Our Courses
- ZooKeeper Primitives — leader election and worker coordination for judge nodes
- RabbitMQ, Kafka, and SQS — job queuing for submission processing
- Delivery Guarantees — ensuring no submission is lost or judged twice
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.