Kong vs Envoy vs AWS API Gateway
TL;DR
Kong is the plugin-powered REST gateway. Envoy is the programmable proxy that powers service meshes. AWS API Gateway is the zero-ops managed option. NGINX is still the right answer more often than anyone admits.
What It Is

You've decided you need an API gateway. Now you need to pick one. This choice locks you in for years — migration is painful, plugins don't transfer, and configuration formats are all different.
The market has four real contenders, each with a different philosophy. Kong says: "extend me with plugins." Envoy says: "configure me dynamically via API." AWS API Gateway says: "don't think about infrastructure." NGINX says: "I've been doing this since 2004."
Most teams overthink this decision. Here's the uncomfortable truth: for companies under 50 engineers, NGINX with a few config files handles 90% of gateway needs. Everything else is optimization.
Kong — The Plugin Ecosystem
Kong is built on NGINX and OpenResty (NGINX + Lua). It takes NGINX's battle-tested proxy core and wraps it with a plugin system, admin API, and database-backed configuration.
Architecture
┌──────────────────┐
│ Kong Gateway │
│ │
│ ┌────────────┐ │
Client ──────────── │ │ NGINX │ │ ──────────── Backend
│ │ + Lua │ │ Services
│ └──────┬─────┘ │
│ │ │
│ ┌──────┴─────┐ │
│ │ Plugins │ │
│ │ rate-limit │ │
│ │ jwt-auth │ │
│ │ logging │ │
│ └──────┬─────┘ │
│ │ │
│ ┌──────┴─────┐ │
│ │ Config DB │ │
│ │ (Postgres │ │
│ │ or decl) │ │
│ └────────────┘ │
└──────────────────┘
Plugin System
Kong's strength is its plugin ecosystem. Need rate limiting? Enable the plugin. Need JWT authentication? Plugin. Need request logging to Datadog? Plugin. You compose gateway behavior by stacking plugins.
# Kong declarative configuration (kong.yml)
services:
- name: order-service
url: http://order-service:8080
routes:
- name: orders-route
paths:
- /api/orders
strip_path: true
plugins:
- name: rate-limiting
config:
minute: 100
policy: redis
redis_host: redis
- name: jwt
config:
claims_to_verify:
- exp
- name: cors
config:
origins:
- "https://app.example.com"
methods:
- GET
- POST
- name: prometheus
config:
per_consumer: true
Plugins execute in a defined order. Authentication runs first (reject unauthenticated requests early). Rate limiting runs second (reject throttled requests before routing). Logging runs last (capture the complete request lifecycle).
DB vs DB-less Mode
Kong originally required PostgreSQL or Cassandra to store configuration. This added operational complexity — you needed a database just for your gateway.
DB-less mode (introduced in Kong 1.1) loads configuration from a YAML file. No database needed. Configuration updates require a restart or reload, but for many teams that's fine.
DB mode:
✓ Admin API for runtime changes
✓ Multiple Kong nodes share config
✗ Requires PostgreSQL/Cassandra
✗ Database is a SPOF unless clustered
DB-less mode:
✓ No external database
✓ Config as code (version controlled)
✗ No runtime Admin API writes
✗ Config changes need deployment
For most teams: start with DB-less mode. Move to DB mode only when you need runtime plugin configuration without deployments.
Kong Strengths
- Rich plugin ecosystem (200+ plugins)
- Easy to start — NGINX under the hood, familiar mental model
- Declarative config works well with GitOps
- Kong Cloud option for managed hosting
- REST-centric — built for traditional API gateway use cases
Kong Weaknesses
- Lua is a niche language — writing custom plugins is painful
- NGINX's architecture limits advanced load balancing (no circuit breaking in the NGINX core)
- gRPC support exists but is second-class compared to REST
- Plugin ordering can be confusing with complex chains
Envoy — The Programmable Proxy
Envoy was built at Lyft specifically for microservices. It's a C++ proxy designed from the ground up for dynamic configuration, observability, and gRPC support. It's the data plane behind Istio, AWS App Mesh, and most service mesh implementations.
Architecture
┌──────────────────────┐
│ Envoy │
│ │
Client ──────────── │ ┌────────────────┐ │ ────── Backend
│ │ Listener │ │ Services
│ │ (port 8080) │ │
│ └───────┬────────┘ │
│ │ │
│ ┌───────┴────────┐ │
│ │ Filter Chain │ │
│ │ - auth │ │
│ │ - rate limit │ │
│ │ - router │ │
│ └───────┬────────┘ │
│ │ │
│ ┌───────┴────────┐ │
│ │ Cluster │ │
│ │ (upstream │ │
│ │ endpoints) │ │
│ └────────────────┘ │
└──────────────────────┘
↑
┌─────────┴──────────┐
│ Control Plane │
│ (xDS APIs) │
│ pushes config │
│ dynamically │
└────────────────────┘
xDS — Dynamic Configuration
Envoy's defining feature. Instead of reloading config files, Envoy discovers its configuration dynamically through a set of APIs called xDS (x Discovery Service).
xDS APIs:
LDS (Listener Discovery Service) → which ports to listen on
RDS (Route Discovery Service) → which routes map to which clusters
CDS (Cluster Discovery Service) → which backend clusters exist
EDS (Endpoint Discovery Service) → which instances are in each cluster
SDS (Secret Discovery Service) → TLS certificates
A control plane implements these APIs. Envoy connects, subscribes, and receives configuration updates in real time. No restarts. No file reloads. New service deployed? The control plane pushes the new endpoint via EDS. Envoy starts routing to it immediately.
Envoy's static configuration is extremely verbose — listeners, filter chains, typed configs with full protobuf type URLs. That's by design. Envoy's config format is for machines to generate, not humans to write. In production, the control plane generates and pushes it via xDS. You rarely touch YAML directly.
Built-in Observability
Envoy emits detailed metrics for every connection, request, and upstream interaction — without any plugins.
Metrics exposed automatically:
- Request count per route, per status code
- Latency histograms (p50, p95, p99)
- Connection pool stats (active, pending, overflow)
- Circuit breaker trip counts
- Retry counts and outcomes
- Health check pass/fail rates
All exported as Prometheus metrics or StatsD
This is where Envoy destroys Kong. Kong requires a Prometheus plugin and additional configuration per service. Envoy gives you fine-grained observability out of the box on every request.
Envoy Strengths
- Dynamic configuration via xDS — zero-downtime config changes
- First-class gRPC and HTTP/2 support
- Built-in circuit breaking, retries, timeouts per route
- Detailed observability without plugins
- Foundation for Istio and other service meshes
- C++ — extremely high performance
Envoy Weaknesses
- Configuration is complex and verbose
- Not designed for human-friendly gateway use cases
- No plugin marketplace — extensibility via C++ filters or WASM
- Steep learning curve for operators
- Overkill as a standalone gateway for REST APIs
AWS API Gateway — Zero Infrastructure
AWS API Gateway is a fully managed service. No servers. No clusters. No patching. You define routes, attach Lambda functions or HTTP backends, and AWS handles scaling, availability, and SSL.
Two Flavors
REST API (v1):
- Full feature set: request validation, caching,
API keys, usage plans
- Request/response transformation with VTL templates
- Higher cost per request
- Best for: traditional REST APIs with complex requirements
HTTP API (v2):
- Simpler, cheaper, faster
- JWT authorization built in
- No request transformation
- Significantly lower latency (AWS reports up to 60% faster) than REST API
- Best for: simple proxy to Lambda or HTTP backends
Most new projects should use HTTP API. The REST API's extra features (VTL transformation, WAF integration) are rarely needed and add cost.
Lambda Integration
This is AWS API Gateway's killer feature. Define a route, point it at a Lambda function, done. No servers, no containers, no scaling configuration.
# AWS SAM template — API Gateway + Lambda
Resources:
OrdersApi:
Type: AWS::Serverless::Api
Properties:
StageName: prod
GetOrderFunction:
Type: AWS::Serverless::Function
Properties:
Handler: orders.get_handler
Runtime: python3.11
Events:
GetOrder:
Type: Api
Properties:
Path: /orders/{id}
Method: get
RestApiId: !Ref OrdersApi
The gateway handles auth, rate limiting, and SSL. Lambda handles business logic. Together, they eliminate every infrastructure component between the client and your code. This is why startups love it — zero ops.
Usage Plans and API Keys
AWS API Gateway natively supports tiered access — usage plans with different rate limits and quotas, tied to API keys.
Usage plan: "Free Tier"
Throttle: 10 requests/second
Quota: 10,000 requests/month
API keys: [key-abc123, key-def456]
Usage plan: "Pro Tier"
Throttle: 100 requests/second
Quota: 1,000,000 requests/month
API keys: [key-pro-789]
AWS API Gateway Strengths
- Zero infrastructure management
- Pay-per-request pricing (great for low traffic)
- Native Lambda integration
- Built-in auth (Cognito, JWT, IAM)
- Automatic scaling to any traffic level
- WebSocket support
AWS API Gateway Weaknesses
- Vendor lock-in to AWS
- 29-second timeout for HTTP integration (Lambda has its own limits)
- Limited customization compared to self-hosted options
- Cost scales linearly — expensive at high traffic ($3.50 per million requests)
- Cold start latency when combined with Lambda
- VTL transformation language is arcane and poorly documented
NGINX — The Underappreciated Default
Here's the opinion most gateway vendors don't want you to hear: NGINX with 50 lines of config is a legitimate API gateway for most companies.
# Complete API gateway in NGINX
upstream order_service {
server order-svc:8080;
server order-svc-2:8080;
}
upstream user_service {
server user-svc:8080;
}
# Rate limiting zone: 10 requests per second per IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
# Rate limiting
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
# Request routing
location /api/orders {
proxy_pass http://order_service;
proxy_set_header X-Request-Id $request_id;
proxy_set_header X-Real-IP $remote_addr;
}
location /api/users {
proxy_pass http://user_service;
proxy_set_header X-Request-Id $request_id;
proxy_set_header X-Real-IP $remote_addr;
}
# Health check endpoint
location /health {
return 200 '{"status": "ok"}';
add_header Content-Type application/json;
}
}
That's SSL termination, rate limiting, routing, header injection, and health checks. No database. No plugins. No Lua. No control plane. NGINX handles millions of requests per second. It's been battle-tested for 20 years.
You lose dynamic configuration, plugin ecosystems, and fancy dashboards. You gain simplicity, reliability, and a tool every engineer already knows.
Traefik — The Container-Native Option
Traefik auto-discovers services from Docker labels and Kubernetes annotations. No configuration files for routing — Traefik watches your orchestrator and builds routes automatically.
# Docker Compose — Traefik auto-discovers services
services:
traefik:
image: traefik:v2.10
command:
- "--providers.docker=true"
- "--entrypoints.web.address=:80"
ports:
- "80:80"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
order-service:
image: order-service:latest
labels:
- "traefik.http.routers.orders.rule=PathPrefix(`/api/orders`)"
- "traefik.http.services.orders.loadbalancer.server.port=8080"
user-service:
image: user-service:latest
labels:
- "traefik.http.routers.users.rule=PathPrefix(`/api/users`)"
- "traefik.http.services.users.loadbalancer.server.port=8080"
Deploy a new service with Docker labels, and Traefik routes to it automatically. No config reload. No admin API call. This is remarkably convenient for small teams running Docker Compose or Kubernetes.
Traefik Strengths
- Zero-config service discovery from Docker/Kubernetes
- Auto-TLS with Let's Encrypt
- Dashboard included
- Middleware system for auth, rate limiting, retries
Traefik Weaknesses
- Performance lower than NGINX or Envoy under extreme load
- Smaller plugin ecosystem than Kong
- Not suited for non-container environments
Comparison Table
| Feature | Kong | Envoy | AWS API GW | NGINX | Traefik |
|---|---|---|---|---|---|
| Language | Lua/NGINX | C++ | Managed | C | Go |
| Config model | DB or declarative YAML | xDS (dynamic API) | Console/CloudFormation | Static files | Docker labels / K8s |
| Plugin/extension | 200+ Lua plugins | C++/WASM filters | Limited | NGINX modules | Go middleware |
| gRPC support | Yes (basic) | Yes (native) | Yes | Yes (since 1.13) | Yes |
| HTTP/2 | Yes | Yes (native) | Yes | Yes | Yes |
| Circuit breaking | Plugin | Built-in | No | No (needs OpenResty) | Yes (middleware) |
| Observability | Plugins needed | Built-in (rich) | CloudWatch | Access logs | Built-in dashboard |
| Dynamic config | Admin API (DB mode) | xDS (real-time) | API/Console | Reload required | Auto-discovery |
| Cost model | Self-hosted / Kong Cloud | Self-hosted | Per-request ($3.50/M) | Self-hosted | Self-hosted |
| Learning curve | Low | High | Low | Low | Low |
| Best for | REST APIs, plugin-heavy | Microservices, mesh | Serverless, AWS-native | Simple gateway | Container environments |
Decision Framework
Stop overthinking this. Answer four questions.
Are you on AWS and want zero infrastructure?
→ AWS API Gateway (HTTP API for simple, REST API for features)
Are you running containers with Docker/Kubernetes?
→ Traefik for small teams
→ Envoy + control plane for large-scale microservices
Do you need a rich plugin ecosystem for REST APIs?
→ Kong
Do you need gRPC-native proxying and mesh integration?
→ Envoy
Are you under 50 engineers with straightforward routing?
→ NGINX. Seriously. NGINX.
Patterns for System Design Interviews
Pattern 1: Startup API Gateway
[Client] → [AWS API Gateway] → [Lambda Functions]
↓
JWT auth (built-in)
Rate limit (usage plans)
No infrastructure to manage
For a startup interview design: AWS API Gateway + Lambda. Explain why you're not managing infrastructure. Scale to zero when there's no traffic. Pay per request. This shows maturity — you're solving the business problem, not playing with technology.
Pattern 2: Growing Company Migration
Phase 1: AWS API Gateway (0-50 engineers)
Phase 2: Kong behind ALB (50-200 engineers)
- Custom auth plugin
- Per-team rate limiting
- API analytics
Phase 3: Envoy + control plane (200+ engineers)
- Service mesh (Istio/custom)
- gRPC everywhere
- Dynamic routing
Show the interviewer you understand that gateway choice evolves with scale. Starting with Envoy at 10 engineers is over-engineering. Starting with NGINX at 500 engineers is under-engineering.
Pattern 3: Multi-Region Gateway
┌── US-East: Kong cluster ── US services
Client → Route53 ──┤
(latency-based) └── EU-West: Kong cluster ── EU services
Each region:
[ALB] → [Kong cluster (3 nodes)] → [Backend services]
Kong nodes share config via PostgreSQL (per-region)
Cross-region routing via DNS, not the gateway
Trade-offs Table
| Trade-off | Choose A | Choose B |
|---|---|---|
| Control vs Operations | Self-hosted Kong/Envoy/NGINX (full control) | AWS API Gateway (zero ops) |
| Plugins vs Performance | Kong (rich plugins, Lua overhead) | Envoy (raw performance, fewer extensions) |
| Dynamic vs Static config | Envoy xDS (real-time, complex) | NGINX files (simple, requires reload) |
| Vendor lock-in vs Simplicity | AWS API Gateway (locked in, simple) | Kong/NGINX (portable, more work) |
| Latency vs Features | AWS HTTP API (fast, minimal features) | AWS REST API (slower, full features) |
| Auto-discovery vs Explicit | Traefik (auto, less control) | Kong/NGINX (explicit, full control) |

Interview Gotchas
Gotcha 1: Don't Pick Envoy for a REST API Gateway
If the problem is "build a REST API gateway for a startup," Envoy is the wrong answer. Envoy's configuration is designed for machines (xDS). Kong or even NGINX is far more appropriate for human-managed REST routing. Envoy shines as the data plane in a service mesh — not as a standalone gateway for a 10-person team.
Gotcha 2: AWS API Gateway Has a 29-Second Timeout
If any backend call takes longer than 29 seconds, AWS API Gateway returns a 504. This catches people designing video processing or report generation APIs. Long-running tasks need an async pattern: accept the request, return a job ID, poll for completion.
Gotcha 3: NGINX Rate Limiting Is Per-Instance
NGINX's limit_req directive uses in-memory state. If you have 5 NGINX instances behind a load balancer, each instance tracks limits independently. A user gets 5x the intended rate limit. For accurate distributed rate limiting, use Redis-backed limits (OpenResty + Lua) or move rate limiting to a dedicated service.
Gotcha 4: Kong + Cassandra Is a Trap
Kong supports Cassandra as a config store. Don't use it. Cassandra adds enormous operational complexity for a configuration database that stores kilobytes. Use PostgreSQL or go DB-less. Multiple Kong engineering teams have abandoned Cassandra and migrated to PostgreSQL after painful incidents.
Gotcha 5: Gateway Cost at Scale
AWS API Gateway costs $3.50 per million requests. At 1 billion requests/month, that's $3,500/month just for the gateway. A self-hosted NGINX or Kong cluster on a few EC2 instances costs a fraction of that. Managed gateways are cheap at low scale and expensive at high scale. Know where the crossover point is for your system.