Kong vs Envoy vs AWS API Gateway

TL;DR

Kong is the plugin-powered REST gateway. Envoy is the programmable proxy that powers service meshes. AWS API Gateway is the zero-ops managed option. NGINX is still the right answer more often than anyone admits.

What It Is

Gateway Comparison

You've decided you need an API gateway. Now you need to pick one. This choice locks you in for years — migration is painful, plugins don't transfer, and configuration formats are all different.

The market has four real contenders, each with a different philosophy. Kong says: "extend me with plugins." Envoy says: "configure me dynamically via API." AWS API Gateway says: "don't think about infrastructure." NGINX says: "I've been doing this since 2004."

Most teams overthink this decision. Here's the uncomfortable truth: for companies under 50 engineers, NGINX with a few config files handles 90% of gateway needs. Everything else is optimization.

Kong — The Plugin Ecosystem

Kong is built on NGINX and OpenResty (NGINX + Lua). It takes NGINX's battle-tested proxy core and wraps it with a plugin system, admin API, and database-backed configuration.

Architecture

                    ┌──────────────────┐
                    │    Kong Gateway   │
                    │                  │
                    │  ┌────────────┐  │
Client ──────────── │  │   NGINX    │  │ ──────────── Backend
                    │  │  + Lua     │  │              Services
                    │  └──────┬─────┘  │
                    │         │        │
                    │  ┌──────┴─────┐  │
                    │  │  Plugins   │  │
                    │  │ rate-limit │  │
                    │  │ jwt-auth   │  │
                    │  │ logging    │  │
                    │  └──────┬─────┘  │
                    │         │        │
                    │  ┌──────┴─────┐  │
                    │  │ Config DB  │  │
                    │  │ (Postgres  │  │
                    │  │  or decl)  │  │
                    │  └────────────┘  │
                    └──────────────────┘

Plugin System

Kong's strength is its plugin ecosystem. Need rate limiting? Enable the plugin. Need JWT authentication? Plugin. Need request logging to Datadog? Plugin. You compose gateway behavior by stacking plugins.

# Kong declarative configuration (kong.yml)
services:
  - name: order-service
    url: http://order-service:8080
    routes:
      - name: orders-route
        paths:
          - /api/orders
        strip_path: true
    plugins:
      - name: rate-limiting
        config:
          minute: 100
          policy: redis
          redis_host: redis
      - name: jwt
        config:
          claims_to_verify:
            - exp
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
      - name: prometheus
        config:
          per_consumer: true

Plugins execute in a defined order. Authentication runs first (reject unauthenticated requests early). Rate limiting runs second (reject throttled requests before routing). Logging runs last (capture the complete request lifecycle).

DB vs DB-less Mode

Kong originally required PostgreSQL or Cassandra to store configuration. This added operational complexity — you needed a database just for your gateway.

DB-less mode (introduced in Kong 1.1) loads configuration from a YAML file. No database needed. Configuration updates require a restart or reload, but for many teams that's fine.

DB mode:
  ✓ Admin API for runtime changes
  ✓ Multiple Kong nodes share config
  ✗ Requires PostgreSQL/Cassandra
  ✗ Database is a SPOF unless clustered

DB-less mode:
  ✓ No external database
  ✓ Config as code (version controlled)
  ✗ No runtime Admin API writes
  ✗ Config changes need deployment

For most teams: start with DB-less mode. Move to DB mode only when you need runtime plugin configuration without deployments.

Kong Strengths

Rich plugin ecosystem (200+ plugins)
Easy to start — NGINX under the hood, familiar mental model
Declarative config works well with GitOps
Kong Cloud option for managed hosting
REST-centric — built for traditional API gateway use cases

Kong Weaknesses

Lua is a niche language — writing custom plugins is painful
NGINX's architecture limits advanced load balancing (no circuit breaking in the NGINX core)
gRPC support exists but is second-class compared to REST
Plugin ordering can be confusing with complex chains

Envoy — The Programmable Proxy

Envoy was built at Lyft specifically for microservices. It's a C++ proxy designed from the ground up for dynamic configuration, observability, and gRPC support. It's the data plane behind Istio, AWS App Mesh, and most service mesh implementations.

Architecture

                    ┌──────────────────────┐
                    │       Envoy          │
                    │                      │
Client ──────────── │  ┌────────────────┐  │ ────── Backend
                    │  │  Listener      │  │        Services
                    │  │  (port 8080)   │  │
                    │  └───────┬────────┘  │
                    │          │           │
                    │  ┌───────┴────────┐  │
                    │  │  Filter Chain  │  │
                    │  │  - auth        │  │
                    │  │  - rate limit  │  │
                    │  │  - router      │  │
                    │  └───────┬────────┘  │
                    │          │           │
                    │  ┌───────┴────────┐  │
                    │  │  Cluster       │  │
                    │  │  (upstream     │  │
                    │  │   endpoints)   │  │
                    │  └────────────────┘  │
                    └──────────────────────┘
                              ↑
                    ┌─────────┴──────────┐
                    │   Control Plane    │
                    │   (xDS APIs)       │
                    │   pushes config    │
                    │   dynamically      │
                    └────────────────────┘

xDS — Dynamic Configuration

Envoy's defining feature. Instead of reloading config files, Envoy discovers its configuration dynamically through a set of APIs called xDS (x Discovery Service).

xDS APIs:
  LDS (Listener Discovery Service)  → which ports to listen on
  RDS (Route Discovery Service)     → which routes map to which clusters
  CDS (Cluster Discovery Service)   → which backend clusters exist
  EDS (Endpoint Discovery Service)  → which instances are in each cluster
  SDS (Secret Discovery Service)    → TLS certificates

A control plane implements these APIs. Envoy connects, subscribes, and receives configuration updates in real time. No restarts. No file reloads. New service deployed? The control plane pushes the new endpoint via EDS. Envoy starts routing to it immediately.

Envoy's static configuration is extremely verbose — listeners, filter chains, typed configs with full protobuf type URLs. That's by design. Envoy's config format is for machines to generate, not humans to write. In production, the control plane generates and pushes it via xDS. You rarely touch YAML directly.

Built-in Observability

Envoy emits detailed metrics for every connection, request, and upstream interaction — without any plugins.

Metrics exposed automatically:
  - Request count per route, per status code
  - Latency histograms (p50, p95, p99)
  - Connection pool stats (active, pending, overflow)
  - Circuit breaker trip counts
  - Retry counts and outcomes
  - Health check pass/fail rates

All exported as Prometheus metrics or StatsD

This is where Envoy destroys Kong. Kong requires a Prometheus plugin and additional configuration per service. Envoy gives you fine-grained observability out of the box on every request.

Envoy Strengths

Dynamic configuration via xDS — zero-downtime config changes
First-class gRPC and HTTP/2 support
Built-in circuit breaking, retries, timeouts per route
Detailed observability without plugins
Foundation for Istio and other service meshes
C++ — extremely high performance

Envoy Weaknesses

Configuration is complex and verbose
Not designed for human-friendly gateway use cases
No plugin marketplace — extensibility via C++ filters or WASM
Steep learning curve for operators
Overkill as a standalone gateway for REST APIs

AWS API Gateway — Zero Infrastructure

AWS API Gateway is a fully managed service. No servers. No clusters. No patching. You define routes, attach Lambda functions or HTTP backends, and AWS handles scaling, availability, and SSL.

Two Flavors

REST API (v1):
  - Full feature set: request validation, caching,
    API keys, usage plans
  - Request/response transformation with VTL templates
  - Higher cost per request
  - Best for: traditional REST APIs with complex requirements

HTTP API (v2):
  - Simpler, cheaper, faster
  - JWT authorization built in
  - No request transformation
  - Significantly lower latency (AWS reports up to 60% faster) than REST API
  - Best for: simple proxy to Lambda or HTTP backends

Most new projects should use HTTP API. The REST API's extra features (VTL transformation, WAF integration) are rarely needed and add cost.

Lambda Integration

This is AWS API Gateway's killer feature. Define a route, point it at a Lambda function, done. No servers, no containers, no scaling configuration.

# AWS SAM template — API Gateway + Lambda
Resources:
  OrdersApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod

  GetOrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: orders.get_handler
      Runtime: python3.11
      Events:
        GetOrder:
          Type: Api
          Properties:
            Path: /orders/{id}
            Method: get
            RestApiId: !Ref OrdersApi

The gateway handles auth, rate limiting, and SSL. Lambda handles business logic. Together, they eliminate every infrastructure component between the client and your code. This is why startups love it — zero ops.

Usage Plans and API Keys

AWS API Gateway natively supports tiered access — usage plans with different rate limits and quotas, tied to API keys.

Usage plan: "Free Tier"
  Throttle: 10 requests/second
  Quota: 10,000 requests/month
  API keys: [key-abc123, key-def456]

Usage plan: "Pro Tier"
  Throttle: 100 requests/second
  Quota: 1,000,000 requests/month
  API keys: [key-pro-789]

AWS API Gateway Strengths

Zero infrastructure management
Pay-per-request pricing (great for low traffic)
Native Lambda integration
Built-in auth (Cognito, JWT, IAM)
Automatic scaling to any traffic level
WebSocket support

AWS API Gateway Weaknesses

Vendor lock-in to AWS
29-second timeout for HTTP integration (Lambda has its own limits)
Limited customization compared to self-hosted options
Cost scales linearly — expensive at high traffic ($3.50 per million requests)
Cold start latency when combined with Lambda
VTL transformation language is arcane and poorly documented

NGINX — The Underappreciated Default

Here's the opinion most gateway vendors don't want you to hear: NGINX with 50 lines of config is a legitimate API gateway for most companies.

# Complete API gateway in NGINX
upstream order_service {
    server order-svc:8080;
    server order-svc-2:8080;
}

upstream user_service {
    server user-svc:8080;
}

# Rate limiting zone: 10 requests per second per IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate     /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;

    # Rate limiting
    limit_req zone=api burst=20 nodelay;
    limit_req_status 429;

    # Request routing
    location /api/orders {
        proxy_pass http://order_service;
        proxy_set_header X-Request-Id $request_id;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /api/users {
        proxy_pass http://user_service;
        proxy_set_header X-Request-Id $request_id;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Health check endpoint
    location /health {
        return 200 '{"status": "ok"}';
        add_header Content-Type application/json;
    }
}

That's SSL termination, rate limiting, routing, header injection, and health checks. No database. No plugins. No Lua. No control plane. NGINX handles millions of requests per second. It's been battle-tested for 20 years.

You lose dynamic configuration, plugin ecosystems, and fancy dashboards. You gain simplicity, reliability, and a tool every engineer already knows.

Traefik — The Container-Native Option

Traefik auto-discovers services from Docker labels and Kubernetes annotations. No configuration files for routing — Traefik watches your orchestrator and builds routes automatically.

# Docker Compose — Traefik auto-discovers services
services:
  traefik:
    image: traefik:v2.10
    command:
      - "--providers.docker=true"
      - "--entrypoints.web.address=:80"
    ports:
      - "80:80"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

  order-service:
    image: order-service:latest
    labels:
      - "traefik.http.routers.orders.rule=PathPrefix(`/api/orders`)"
      - "traefik.http.services.orders.loadbalancer.server.port=8080"

  user-service:
    image: user-service:latest
    labels:
      - "traefik.http.routers.users.rule=PathPrefix(`/api/users`)"
      - "traefik.http.services.users.loadbalancer.server.port=8080"

Deploy a new service with Docker labels, and Traefik routes to it automatically. No config reload. No admin API call. This is remarkably convenient for small teams running Docker Compose or Kubernetes.

Traefik Strengths

Zero-config service discovery from Docker/Kubernetes
Auto-TLS with Let's Encrypt
Dashboard included
Middleware system for auth, rate limiting, retries

Traefik Weaknesses

Performance lower than NGINX or Envoy under extreme load
Smaller plugin ecosystem than Kong
Not suited for non-container environments

Comparison Table

Feature	Kong	Envoy	AWS API GW	NGINX	Traefik
Language	Lua/NGINX	C++	Managed	C	Go
Config model	DB or declarative YAML	xDS (dynamic API)	Console/CloudFormation	Static files	Docker labels / K8s
Plugin/extension	200+ Lua plugins	C++/WASM filters	Limited	NGINX modules	Go middleware
gRPC support	Yes (basic)	Yes (native)	Yes	Yes (since 1.13)	Yes
HTTP/2	Yes	Yes (native)	Yes	Yes	Yes
Circuit breaking	Plugin	Built-in	No	No (needs OpenResty)	Yes (middleware)
Observability	Plugins needed	Built-in (rich)	CloudWatch	Access logs	Built-in dashboard
Dynamic config	Admin API (DB mode)	xDS (real-time)	API/Console	Reload required	Auto-discovery
Cost model	Self-hosted / Kong Cloud	Self-hosted	Per-request ($3.50/M)	Self-hosted	Self-hosted
Learning curve	Low	High	Low	Low	Low
Best for	REST APIs, plugin-heavy	Microservices, mesh	Serverless, AWS-native	Simple gateway	Container environments

Decision Framework

Stop overthinking this. Answer four questions.

Are you on AWS and want zero infrastructure?
  → AWS API Gateway (HTTP API for simple, REST API for features)

Are you running containers with Docker/Kubernetes?
  → Traefik for small teams
  → Envoy + control plane for large-scale microservices

Do you need a rich plugin ecosystem for REST APIs?
  → Kong

Do you need gRPC-native proxying and mesh integration?
  → Envoy

Are you under 50 engineers with straightforward routing?
  → NGINX. Seriously. NGINX.

Patterns for System Design Interviews

Pattern 1: Startup API Gateway

[Client] → [AWS API Gateway] → [Lambda Functions]
                ↓
         JWT auth (built-in)
         Rate limit (usage plans)
         No infrastructure to manage

For a startup interview design: AWS API Gateway + Lambda. Explain why you're not managing infrastructure. Scale to zero when there's no traffic. Pay per request. This shows maturity — you're solving the business problem, not playing with technology.

Pattern 2: Growing Company Migration

Phase 1: AWS API Gateway (0-50 engineers)
Phase 2: Kong behind ALB (50-200 engineers)
  - Custom auth plugin
  - Per-team rate limiting
  - API analytics
Phase 3: Envoy + control plane (200+ engineers)
  - Service mesh (Istio/custom)
  - gRPC everywhere
  - Dynamic routing

Show the interviewer you understand that gateway choice evolves with scale. Starting with Envoy at 10 engineers is over-engineering. Starting with NGINX at 500 engineers is under-engineering.

Pattern 3: Multi-Region Gateway

                    ┌── US-East: Kong cluster ── US services
Client → Route53 ──┤
   (latency-based)  └── EU-West: Kong cluster ── EU services

Each region:
  [ALB] → [Kong cluster (3 nodes)] → [Backend services]

Kong nodes share config via PostgreSQL (per-region)
Cross-region routing via DNS, not the gateway

Trade-offs Table

Trade-off	Choose A	Choose B
Control vs Operations	Self-hosted Kong/Envoy/NGINX (full control)	AWS API Gateway (zero ops)
Plugins vs Performance	Kong (rich plugins, Lua overhead)	Envoy (raw performance, fewer extensions)
Dynamic vs Static config	Envoy xDS (real-time, complex)	NGINX files (simple, requires reload)
Vendor lock-in vs Simplicity	AWS API Gateway (locked in, simple)	Kong/NGINX (portable, more work)
Latency vs Features	AWS HTTP API (fast, minimal features)	AWS REST API (slower, full features)
Auto-discovery vs Explicit	Traefik (auto, less control)	Kong/NGINX (explicit, full control)

Nginx Gateway

Interview Gotchas

Gotcha 1: Don't Pick Envoy for a REST API Gateway

If the problem is "build a REST API gateway for a startup," Envoy is the wrong answer. Envoy's configuration is designed for machines (xDS). Kong or even NGINX is far more appropriate for human-managed REST routing. Envoy shines as the data plane in a service mesh — not as a standalone gateway for a 10-person team.

Gotcha 2: AWS API Gateway Has a 29-Second Timeout

If any backend call takes longer than 29 seconds, AWS API Gateway returns a 504. This catches people designing video processing or report generation APIs. Long-running tasks need an async pattern: accept the request, return a job ID, poll for completion.

Gotcha 3: NGINX Rate Limiting Is Per-Instance

NGINX's limit_req directive uses in-memory state. If you have 5 NGINX instances behind a load balancer, each instance tracks limits independently. A user gets 5x the intended rate limit. For accurate distributed rate limiting, use Redis-backed limits (OpenResty + Lua) or move rate limiting to a dedicated service.

Gotcha 4: Kong + Cassandra Is a Trap

Kong supports Cassandra as a config store. Don't use it. Cassandra adds enormous operational complexity for a configuration database that stores kilobytes. Use PostgreSQL or go DB-less. Multiple Kong engineering teams have abandoned Cassandra and migrated to PostgreSQL after painful incidents.

Gotcha 5: Gateway Cost at Scale

AWS API Gateway costs $3.50 per million requests. At 1 billion requests/month, that's $3,500/month just for the gateway. A self-hosted NGINX or Kong cluster on a few EC2 instances costs a fraction of that. Managed gateways are cheap at low scale and expensive at high scale. Know where the crossover point is for your system.