- Published on
API Gateway Complete Guide 2025: Kong, Envoy, AWS API Gateway, Auth/Rate Limiting/Monitoring
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Table of Contents
1. Why API Gateways Are Needed
1.1 Cross-Cutting Concerns in Microservices
In a microservices architecture, dozens to hundreds of services operate independently. Implementing authentication, logging, and rate limiting individually in each service creates duplication and inconsistency.
World without API Gateway:
[Client]
│ │ │
│ │ └──── [Service A] (own auth, own logging, own rate limiting)
│ └─────── [Service B] (own auth, own logging, own rate limiting)
└────────── [Service C] (own auth, own logging, own rate limiting)
↑
Duplication! Inconsistency! Unmanageable!
World with API Gateway:
[Client]
│
┌────┴─────────────────────────┐
│ API Gateway │
│ ┌─────────────────────────┐ │
│ │ Authentication / AuthZ │ │
│ │ Rate Limiting │ │
│ │ Request Transformation │ │
│ │ Caching │ │
│ │ Logging / Monitoring │ │
│ │ Circuit Breaker │ │
│ │ Load Balancing │ │
│ └─────────────────────────┘ │
└──────┬───────┬───────┬───────┘
│ │ │
[Svc A] [Svc B] [Svc C]
(business (business (business
logic) logic) logic)
1.2 API Gateway Patterns
3 Core Gateway Patterns:
1. Routing Pattern
Route client requests to the correct backend service
/api/users -> User Service
/api/orders -> Order Service
/api/products -> Product Service
2. Aggregation Pattern
Combine responses from multiple services into one
GET /api/dashboard ->
User Service (profile) +
Order Service (recent orders) +
Notification Service (alerts)
-> Single JSON response
3. Offloading Pattern
Move common functions from services to Gateway
Auth, SSL termination, compression, caching, CORS
-> Services focus solely on business logic
1.3 Functions Handled by API Gateways
| Function | Description |
|---|---|
| Routing | Route requests to correct services based on URL patterns/headers |
| Auth/AuthZ | OAuth2, JWT, API Key validation |
| Rate Limiting | Prevent API abuse; per-user/IP/endpoint limits |
| Request/Response Transform | Add/remove headers, body transformation, protocol conversion |
| Caching | Response caching to reduce backend load |
| Load Balancing | Distribute traffic across multiple instances |
| Circuit Breaker | Isolate failing services; prevent cascading failures |
| Logging/Monitoring | Access logs, metrics collection, distributed tracing |
| SSL/TLS Termination | Handle HTTPS at the Gateway layer |
| CORS | Cross-Origin request policy management |
| Canary/A-B Routing | Traffic percentage-based deployments |
| WebSocket/gRPC | Multi-protocol support |
2. Kong Deep Dive
2.1 Kong Architecture
Kong Architecture:
[Client]
│
┌────┴────────────────────────────────┐
│ Kong Gateway │
│ │
│ ┌──────────────────────────────┐ │
│ │ Kong Core (OpenResty) │ │
│ │ Nginx + LuaJIT │ │
│ └──────────┬───────────────────┘ │
│ │ │
│ ┌──────────┴───────────────────┐ │
│ │ Plugin Layer │ │
│ │ │ │
│ │ Auth │ Rate │ Logging │ ... │ │
│ │ │ Limit │ │ │ │
│ └──────────────────────────────┘ │
│ │ │
│ ┌──────────┴───────────────────┐ │
│ │ Data Store │ │
│ │ PostgreSQL │ Cassandra │ │
│ │ (or DB-less Mode) │ │
│ └──────────────────────────────┘ │
└──────┬──────────┬──────────┬────────┘
│ │ │
[Service A] [Service B] [Service C]
2.2 Kong DB-less Mode (Declarative Config)
# kong.yml - DB-less Declarative Configuration
_format_version: "3.0"
_transform: true
services:
- name: user-service
url: http://user-service:8080
routes:
- name: user-routes
paths:
- /api/v1/users
methods:
- GET
- POST
- PUT
- DELETE
strip_path: false
plugins:
- name: rate-limiting
config:
minute: 100
hour: 5000
policy: local
- name: jwt
config:
uri_param_names:
- token
claims_to_verify:
- exp
- name: order-service
url: http://order-service:8080
routes:
- name: order-routes
paths:
- /api/v1/orders
strip_path: false
plugins:
- name: rate-limiting
config:
minute: 50
hour: 2000
- name: request-transformer
config:
add:
headers:
- "X-Request-Source:api-gateway"
- "X-Forwarded-Service:order-service"
- name: product-service
url: http://product-service:8080
routes:
- name: product-routes
paths:
- /api/v1/products
strip_path: false
plugins:
- name: proxy-cache
config:
response_code:
- 200
request_method:
- GET
content_type:
- "application/json"
cache_ttl: 300
strategy: memory
# Global Plugins
plugins:
- name: correlation-id
config:
header_name: X-Request-ID
generator: uuid
echo_downstream: true
- name: prometheus
config:
per_consumer: true
- name: file-log
config:
path: /dev/stdout
reopen: true
2.3 Kong Key Plugins
Kong Plugin Categories:
Authentication:
├── jwt - JWT token verification
├── oauth2 - Built-in OAuth2 server
├── key-auth - API Key authentication
├── basic-auth - Basic authentication
├── hmac-auth - HMAC signature auth
├── ldap-auth - LDAP/AD authentication
└── openid-connect - OIDC (Enterprise)
Security:
├── cors - Cross-Origin policies
├── ip-restriction - IP whitelist/blacklist
├── bot-detection - Bot detection
├── acl - Access Control Lists
└── mtls-auth - mTLS authentication
Traffic Control:
├── rate-limiting - Rate limiting
├── request-size-limiting - Request size limits
├── request-termination - Request blocking / maintenance mode
└── proxy-cache - Response caching
Transformations:
├── request-transformer - Request header/body transformation
├── response-transformer - Response header/body transformation
├── correlation-id - Request tracing ID
└── grpc-gateway - REST to gRPC conversion
Observability:
├── prometheus - Prometheus metrics
├── datadog - Datadog APM integration
├── zipkin - Distributed tracing
├── file-log - File logging
├── http-log - HTTP log forwarding
└── opentelemetry - OTel integration
2.4 Kong Custom Plugin (Lua)
-- custom-auth-plugin/handler.lua
local BasePlugin = require "kong.plugins.base_plugin"
local CustomAuthHandler = BasePlugin:extend()
CustomAuthHandler.VERSION = "1.0.0"
CustomAuthHandler.PRIORITY = 1000 -- Execution order
function CustomAuthHandler:new()
CustomAuthHandler.super.new(self, "custom-auth")
end
function CustomAuthHandler:access(conf)
CustomAuthHandler.super.access(self)
-- 1. Extract API Key
local api_key = kong.request.get_header("X-API-Key")
if not api_key then
return kong.response.exit(401, {
message = "Missing API Key"
})
end
-- 2. Call external auth service
local httpc = require("resty.http").new()
local res, err = httpc:request_uri(conf.auth_service_url, {
method = "POST",
headers = {
["Content-Type"] = "application/json",
},
body = '{"api_key":"' .. api_key .. '"}',
timeout = conf.timeout or 5000,
})
if not res then
kong.log.err("Auth service error: ", err)
return kong.response.exit(503, {
message = "Auth service unavailable"
})
end
if res.status ~= 200 then
return kong.response.exit(403, {
message = "Invalid API Key"
})
end
-- 3. Add auth info to upstream headers
local cjson = require "cjson.safe"
local body = cjson.decode(res.body)
if body then
kong.service.request.set_header("X-Consumer-ID", body.consumer_id or "")
kong.service.request.set_header("X-Consumer-Plan", body.plan or "free")
end
end
return CustomAuthHandler
3. Envoy Proxy Deep Dive
3.1 Envoy Architecture
Envoy Core Architecture:
┌──────────────────────────────────────────────┐
│ Envoy Proxy │
│ │
│ Listener (port binding) │
│ │ │
│ ▼ │
│ Filter Chain │
│ ├── Network Filters (L3/L4) │
│ │ ├── TCP Proxy │
│ │ ├── TLS Inspector │
│ │ └── HTTP Connection Manager │
│ │ │
│ └── HTTP Filters (L7) │
│ ├── Router │
│ ├── CORS │
│ ├── JWT Auth │
│ ├── Rate Limit │
│ ├── WASM Filter (custom) │
│ └── ... │
│ │
│ Route Configuration │
│ │ │
│ ▼ │
│ Cluster (upstream service group) │
│ ├── Endpoint Discovery │
│ ├── Load Balancing │
│ ├── Health Checking │
│ ├── Circuit Breaking │
│ └── Outlier Detection │
└──────────────────────────────────────────────┘
3.2 Envoy Static Configuration
# envoy.yaml - Static Configuration
static_resources:
listeners:
- name: main_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
log_format:
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
status: "%RESPONSE_CODE%"
duration: "%DURATION%"
upstream: "%UPSTREAM_HOST%"
route_config:
name: local_route
virtual_hosts:
- name: api_service
domains: ["*"]
routes:
- match:
prefix: "/api/v1/users"
route:
cluster: user_service
timeout: 10s
retry_policy:
retry_on: "5xx,connect-failure"
num_retries: 3
per_try_timeout: 3s
- match:
prefix: "/api/v1/orders"
route:
cluster: order_service
timeout: 15s
- match:
prefix: "/api/v1/products"
route:
weighted_clusters:
clusters:
- name: product_service_v1
weight: 90
- name: product_service_v2
weight: 10
http_filters:
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
auth0:
issuer: "https://company.auth0.com/"
audiences:
- "https://api.company.com"
remote_jwks:
http_uri:
uri: "https://company.auth0.com/.well-known/jwks.json"
cluster: auth0_jwks
timeout: 5s
cache_duration: 600s
rules:
- match:
prefix: "/api/"
requires:
provider_name: "auth0"
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: user_service
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
health_checks:
- timeout: 3s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: "/health"
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 1024
max_requests: 1024
max_retries: 3
load_assignment:
cluster_name: user_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: user-service
port_value: 8080
- name: order_service
connect_timeout: 5s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
load_assignment:
cluster_name: order_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: order-service
port_value: 8080
3.3 xDS API (Dynamic Configuration)
xDS API Types:
┌──────────────────────────────────────┐
│ Control Plane │
│ (Istio Pilot / custom xDS server) │
└──────────┬───────────────────────────┘
│
┌────────┴──────────────────┐
│ │
▼ ▼
[Envoy A] [Envoy B]
xDS Types:
├── LDS (Listener Discovery Service)
│ -> Dynamic listener config changes
├── RDS (Route Discovery Service)
│ -> Dynamic routing rule changes
├── CDS (Cluster Discovery Service)
│ -> Dynamic cluster (upstream) changes
├── EDS (Endpoint Discovery Service)
│ -> Dynamic endpoint (IP:Port) changes
├── SDS (Secret Discovery Service)
│ -> Dynamic TLS certificate changes
└── ECDS (Extension Config Discovery)
-> Dynamic filter config changes
4. AWS API Gateway
4.1 AWS API Gateway Type Comparison
3 AWS API Gateway Types:
┌──────────────┬───────────────┬───────────────┬───────────────┐
│ │ REST API │ HTTP API │ WebSocket │
├──────────────┼───────────────┼───────────────┼───────────────┤
│ Protocol │ REST │ REST │ WebSocket │
│ Latency │ Standard │ Lower (~35%) │ Standard │
│ Price │ Higher │ Lower (~70%) │ Per message │
│ Caching │ Yes │ No │ No │
│ Usage Plans │ Yes │ No │ No │
│ API Keys │ Yes │ No │ No │
│ WAF │ Yes │ No │ No │
│ Validation │ Yes │ Params only │ No │
│ Custom Domain│ Yes │ Yes │ Yes │
│ Lambda │ Yes │ Yes │ Yes │
│ VPC Link │ Yes │ Yes │ No │
│ Best For │ Full features │ Simple proxy │ Real-time │
└──────────────┴───────────────┴───────────────┴───────────────┘
4.2 AWS REST API Gateway + Lambda
# SAM Template - REST API Gateway
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Globals:
Api:
Cors:
AllowMethods: "'GET,POST,PUT,DELETE,OPTIONS'"
AllowHeaders: "'Content-Type,Authorization,X-Api-Key'"
AllowOrigin: "'https://app.company.com'"
Resources:
ApiGateway:
Type: AWS::Serverless::Api
Properties:
StageName: prod
Auth:
DefaultAuthorizer: CognitoAuth
Authorizers:
CognitoAuth:
UserPoolArn: !GetAtt UserPool.Arn
Identity:
Header: Authorization
ApiKeyRequired: true
UsagePlan:
CreateUsagePlan: PER_API
UsagePlanName: "StandardPlan"
Throttle:
BurstLimit: 100
RateLimit: 50
Quota:
Limit: 10000
Period: DAY
UserFunction:
Type: AWS::Serverless::Function
Properties:
Handler: handlers/user.handler
Runtime: nodejs20.x
MemorySize: 256
Timeout: 10
Events:
GetUsers:
Type: Api
Properties:
RestApiId: !Ref ApiGateway
Path: /api/v1/users
Method: GET
CreateUser:
Type: Api
Properties:
RestApiId: !Ref ApiGateway
Path: /api/v1/users
Method: POST
OrderFunction:
Type: AWS::Serverless::Function
Properties:
Handler: handlers/order.handler
Runtime: nodejs20.x
MemorySize: 512
Timeout: 15
Events:
GetOrders:
Type: Api
Properties:
RestApiId: !Ref ApiGateway
Path: /api/v1/orders
Method: GET
UserPool:
Type: AWS::Cognito::UserPool
Properties:
UserPoolName: api-user-pool
AutoVerifiedAttributes:
- email
Policies:
PasswordPolicy:
MinimumLength: 12
RequireUppercase: true
RequireLowercase: true
RequireNumbers: true
RequireSymbols: true
4.3 Lambda Authorizer
// authorizer.js - Lambda Custom Authorizer
const jwt = require('jsonwebtoken');
exports.handler = async (event) => {
try {
const token = extractToken(event.authorizationToken);
if (!token) {
throw new Error('Unauthorized');
}
const decoded = await verifyToken(token);
const policy = generatePolicy(
decoded.sub,
'Allow',
event.methodArn,
{
userId: decoded.sub,
email: decoded.email,
role: decoded.role,
plan: decoded.plan || 'free'
}
);
return policy;
} catch (error) {
console.error('Authorization failed:', error.message);
throw new Error('Unauthorized');
}
};
function extractToken(authHeader) {
if (!authHeader) return null;
const parts = authHeader.split(' ');
if (parts.length !== 2 || parts[0] !== 'Bearer') return null;
return parts[1];
}
function generatePolicy(principalId, effect, resource, context) {
const [arn, partition, service, region, accountId, apiId, stage] =
resource.split(/[:/]/);
return {
principalId,
policyDocument: {
Version: '2012-10-17',
Statement: [{
Action: 'execute-api:Invoke',
Effect: effect,
Resource: `arn:${partition}:${service}:${region}:${accountId}:${apiId}/${stage}/*`
}]
},
context: context || {}
};
}
5. Traefik
5.1 Traefik Architecture
Traefik Architecture:
[Client]
│
┌────┴────────────────────────────────┐
│ Traefik Proxy │
│ │
│ EntryPoints (ports) │
│ ├── :80 (web) │
│ └── :443 (websecure) │
│ │ │
│ Routers (routing rules) │
│ ├── Host / Path / Header matching │
│ ├── TLS configuration │
│ └── Middleware chain │
│ │ │
│ Middlewares (processing) │
│ ├── RateLimit │
│ ├── BasicAuth / ForwardAuth │
│ ├── Headers │
│ ├── Retry │
│ ├── CircuitBreaker │
│ └── StripPrefix │
│ │ │
│ Services (backends) │
│ ├── LoadBalancer │
│ ├── Weighted │
│ └── Mirroring │
│ │
│ Providers (auto-discovery) │
│ ├── Docker │
│ ├── Kubernetes │
│ ├── File │
│ └── Consul / etcd │
└──────────────────────────────────────┘
5.2 Docker + Traefik Auto-Discovery
# docker-compose.yml with Traefik
version: "3.8"
services:
traefik:
image: traefik:v3.0
command:
- "--api.dashboard=true"
- "--providers.docker=true"
- "--providers.docker.exposedByDefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
- "--certificatesresolvers.letsencrypt.acme.email=admin@company.com"
- "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
- "--metrics.prometheus=true"
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- letsencrypt:/letsencrypt
user-service:
image: user-service:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.users.rule=Host(`api.company.com`) && PathPrefix(`/api/v1/users`)"
- "traefik.http.routers.users.entrypoints=websecure"
- "traefik.http.routers.users.tls.certresolver=letsencrypt"
- "traefik.http.routers.users.middlewares=rate-limit,auth-forward"
- "traefik.http.services.users.loadbalancer.server.port=8080"
order-service:
image: order-service:latest
labels:
- "traefik.enable=true"
- "traefik.http.routers.orders.rule=Host(`api.company.com`) && PathPrefix(`/api/v1/orders`)"
- "traefik.http.routers.orders.entrypoints=websecure"
- "traefik.http.routers.orders.tls.certresolver=letsencrypt"
- "traefik.http.services.orders.loadbalancer.server.port=8080"
volumes:
letsencrypt:
5.3 Kubernetes IngressRoute (CRD)
# Traefik IngressRoute CRD
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-routes
namespace: production
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.company.com`) && PathPrefix(`/api/v1/users`)
kind: Rule
services:
- name: user-service
port: 8080
weight: 100
middlewares:
- name: rate-limit
- name: jwt-auth
- match: Host(`api.company.com`) && PathPrefix(`/api/v1/products`)
kind: Rule
services:
- name: product-service-v1
port: 8080
weight: 90
- name: product-service-v2
port: 8080
weight: 10
tls:
certResolver: letsencrypt
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: production
spec:
rateLimit:
average: 100
burst: 50
period: 1m
sourceCriterion:
ipStrategy:
depth: 1
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: jwt-auth
namespace: production
spec:
forwardAuth:
address: http://auth-service:8080/verify
authResponseHeaders:
- X-User-ID
- X-User-Role
6. API Gateway Comparison Table (15+ Dimensions)
| Category | Kong | Envoy | AWS API GW | Traefik |
|---|---|---|---|---|
| Foundation | Nginx/OpenResty | C++ (custom) | AWS Managed | Go (custom) |
| License | Apache 2.0 / Enterprise | Apache 2.0 | Pay-per-use | MIT |
| Deployment | Self-hosted / Cloud | Self-hosted (sidecar) | Serverless | Self-hosted |
| Performance | High (Nginx) | Very High | High (managed) | High |
| Configuration | Admin API / Declarative | YAML / xDS API | Console / CloudFormation | Labels / YAML / CRD |
| DB-less Mode | Yes | N/A (always stateless) | N/A | N/A |
| Plugins | Lua / Go | C++ / WASM | Lambda Authorizer | Built-in Middleware |
| Service Discovery | DNS / Consul | EDS (xDS) | CloudMap / VPC Link | Docker / K8s / Consul |
| L7 Features | Rich | Very Rich | Basic | Rich |
| gRPC | Yes | Yes | Yes (HTTP/2) | Yes |
| WebSocket | Yes | Yes | Yes (separate API) | Yes |
| mTLS | Yes (Enterprise) | Yes | Yes (VPC Link) | Yes |
| Rate Limiting | Built-in plugin | External svc / WASM | Built-in (Usage Plans) | Built-in Middleware |
| Dist. Tracing | Zipkin/OTel plugin | Built-in (Zipkin/OTel) | X-Ray | Jaeger/Zipkin |
| Dashboard | Kong Manager | None (use Kiali) | AWS Console | Built-in dashboard |
| Learning Curve | Medium | High | Low | Low |
| Best For | General purpose GW | Service Mesh / sidecar | AWS serverless | Docker/K8s environments |
7. Authentication
7.1 JWT Verification Implementation
# JWT Verification Middleware (Python/FastAPI example)
from fastapi import Request, HTTPException
from jose import jwt, JWTError, ExpiredSignatureError
import httpx
class JWTAuthMiddleware:
def __init__(self, jwks_url: str, issuer: str, audience: str):
self.jwks_url = jwks_url
self.issuer = issuer
self.audience = audience
self._jwks_cache = None
async def get_jwks(self):
"""Fetch and cache JWKS (JSON Web Key Set)"""
if self._jwks_cache is None:
async with httpx.AsyncClient() as client:
response = await client.get(self.jwks_url)
self._jwks_cache = response.json()
return self._jwks_cache
async def verify_token(self, request: Request) -> dict:
"""Extract and verify JWT from request"""
auth_header = request.headers.get("Authorization")
if not auth_header or not auth_header.startswith("Bearer "):
raise HTTPException(
status_code=401,
detail="Missing or invalid Authorization header"
)
token = auth_header.split(" ")[1]
try:
unverified_header = jwt.get_unverified_header(token)
kid = unverified_header.get("kid")
jwks = await self.get_jwks()
key = None
for jwk in jwks.get("keys", []):
if jwk["kid"] == kid:
key = jwk
break
if key is None:
self._jwks_cache = None
jwks = await self.get_jwks()
for jwk in jwks.get("keys", []):
if jwk["kid"] == kid:
key = jwk
break
if key is None:
raise HTTPException(status_code=401, detail="Unknown signing key")
payload = jwt.decode(
token,
key,
algorithms=["RS256"],
audience=self.audience,
issuer=self.issuer,
)
return payload
except ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except JWTError as e:
raise HTTPException(status_code=401, detail=f"Invalid token: {str(e)}")
7.2 OAuth2 Flow
OAuth2 Authorization Code Flow + PKCE:
[Browser/App] [API Gateway] [Auth Server] [Backend]
│ │ │ │
│ 1. Login request │ │ │
│───────────────────────────▶│ │ │
│ │ │ │
│ 2. Redirect to Auth Svr │ │ │
│◀───────────────────────────│ │ │
│ │ │ │
│ 3. Authenticate (ID/PW) │ │ │
│────────────────────────────────────────────────▶│ │
│ │ │ │
│ 4. Authorization Code │ │ │
│◀────────────────────────────────────────────────│ │
│ │ │ │
│ 5. Code + PKCE verifier │ │ │
│───────────────────────────▶│ │ │
│ │ 6. Code -> Token │ │
│ │───────────────────▶│ │
│ │ 7. Access Token │ │
│ │◀───────────────────│ │
│ │ │ │
│ 8. Access Token │ │ │
│◀───────────────────────────│ │ │
│ │ │ │
│ 9. API call + Bearer │ │ │
│───────────────────────────▶│ 10. Verify token │ │
│ │───────────────────▶│ │
│ │ 11. Valid │ │
│ │◀───────────────────│ │
│ │ 12. Forward req │ │
│ │──────────────────────────────────▶│
│ │ 13. Response │ │
│ │◀─────────────────────────────────│
│ 14. API response │ │ │
│◀───────────────────────────│ │ │
7.3 API Key Management
# API Key Management System
import hashlib
import secrets
from datetime import datetime, timedelta
class APIKeyManager:
def __init__(self, db):
self.db = db
def generate_key(self, consumer_id: str, plan: str = "free") -> dict:
"""Generate a new API Key"""
prefix = f"sk_{'live' if plan != 'free' else 'test'}"
raw_key = f"{prefix}_{secrets.token_urlsafe(32)}"
# Store only the hash in the database
key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
record = {
"key_hash": key_hash,
"key_prefix": raw_key[:12],
"consumer_id": consumer_id,
"plan": plan,
"rate_limit": self._get_rate_limit(plan),
"created_at": datetime.utcnow().isoformat(),
"expires_at": (
datetime.utcnow() + timedelta(days=365)
).isoformat(),
"is_active": True,
}
self.db.insert("api_keys", record)
return {
"api_key": raw_key, # Returned once only; not retrievable later
"prefix": raw_key[:12],
"plan": plan,
"expires_at": record["expires_at"],
}
def validate_key(self, raw_key: str) -> dict:
"""Validate an API Key"""
key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
record = self.db.find_one("api_keys", {"key_hash": key_hash})
if not record:
return {"valid": False, "error": "Invalid API key"}
if not record["is_active"]:
return {"valid": False, "error": "API key is deactivated"}
if datetime.fromisoformat(record["expires_at"]) < datetime.utcnow():
return {"valid": False, "error": "API key expired"}
return {
"valid": True,
"consumer_id": record["consumer_id"],
"plan": record["plan"],
"rate_limit": record["rate_limit"],
}
def _get_rate_limit(self, plan: str) -> dict:
limits = {
"free": {"rpm": 60, "rpd": 1000},
"starter": {"rpm": 300, "rpd": 10000},
"pro": {"rpm": 1000, "rpd": 100000},
"enterprise": {"rpm": 10000, "rpd": 1000000},
}
return limits.get(plan, limits["free"])
8. Rate Limiting Algorithms
8.1 Token Bucket
import time
import threading
class TokenBucket:
"""Token Bucket Rate Limiter
Characteristics:
- Tokens are added to the bucket at a fixed rate
- Each request consumes 1 token
- Requests are rejected when no tokens remain
- Burst allowed (up to bucket capacity)
"""
def __init__(self, rate: float, capacity: int):
self.rate = rate # Tokens generated per second
self.capacity = capacity # Max bucket size (burst allowance)
self.tokens = capacity # Current token count
self.last_refill = time.monotonic()
self.lock = threading.Lock()
def allow_request(self) -> bool:
with self.lock:
now = time.monotonic()
elapsed = now - self.last_refill
# Refill tokens
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.rate
)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
# Usage
limiter = TokenBucket(rate=10, capacity=20)
# rate=10: 10 tokens/sec
# capacity=20: max 20 tokens stored (burst of 20)
8.2 Sliding Window Log
import time
from collections import deque
import threading
class SlidingWindowLog:
"""Sliding Window Log Rate Limiter
Characteristics:
- Precise window-based limiting
- Memory usage proportional to request count
- No boundary issues (advantage over Fixed Window)
"""
def __init__(self, max_requests: int, window_seconds: int):
self.max_requests = max_requests
self.window_seconds = window_seconds
self.requests = deque() # Timestamp log
self.lock = threading.Lock()
def allow_request(self) -> bool:
with self.lock:
now = time.monotonic()
window_start = now - self.window_seconds
# Remove old requests outside the window
while self.requests and self.requests[0] < window_start:
self.requests.popleft()
if len(self.requests) < self.max_requests:
self.requests.append(now)
return True
return False
# Usage
limiter = SlidingWindowLog(max_requests=100, window_seconds=60)
# Max 100 requests in a 60-second window
8.3 Distributed Rate Limiting (Redis)
import redis
import time
class DistributedRateLimiter:
"""Redis-based distributed rate limiter (Sliding Window Counter)"""
def __init__(self, redis_client: redis.Redis, prefix: str = "rl"):
self.redis = redis_client
self.prefix = prefix
def is_allowed(
self,
key: str,
max_requests: int,
window_seconds: int
) -> dict:
"""Sliding Window Counter using Redis Sorted Set"""
now = time.time()
window_start = now - window_seconds
redis_key = f"{self.prefix}:{key}"
pipe = self.redis.pipeline()
# 1. Remove old entries outside window
pipe.zremrangebyscore(redis_key, 0, window_start)
# 2. Count current window requests
pipe.zcard(redis_key)
# 3. Add current request
pipe.zadd(redis_key, {f"{now}:{id(object())}": now})
# 4. Set TTL
pipe.expire(redis_key, window_seconds + 1)
results = pipe.execute()
current_count = results[1]
if current_count < max_requests:
remaining = max_requests - current_count - 1
return {
"allowed": True,
"remaining": max(0, remaining),
"reset_at": int(now + window_seconds),
"limit": max_requests,
}
else:
self.redis.zrem(redis_key, f"{now}:{id(object())}")
return {
"allowed": False,
"remaining": 0,
"reset_at": int(now + window_seconds),
"limit": max_requests,
"retry_after": window_seconds,
}
9. Caching and Transformation
9.1 Caching Strategies
API Gateway Caching Strategies:
1. Response Cache
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │────▶│ Gateway │────▶│ Backend │
│ │ │ Cache │ │ │
│ │◀────│ (Hit!) │ │ │
└──────────┘ └──────────┘ └──────────┘
Cache Key = Method + Path + Query + Selected Headers
TTL: GET /products -> 5min, GET /products/123 -> 1min
2. Cache Invalidation Strategies:
- TTL-based: Automatic expiry after time
- Event-based: Purge cache on data changes
- Stale-While-Revalidate: Return stale cache, update in background
3. Cache-Control Headers:
Cache-Control: public, max-age=300, s-maxage=600
ETag: "v1-product-123-hash"
Vary: Accept, Authorization
# Kong Proxy Cache Configuration
plugins:
- name: proxy-cache
config:
strategy: memory
response_code:
- 200
- 301
request_method:
- GET
- HEAD
content_type:
- "application/json"
cache_ttl: 300
vary_headers:
- Accept
- Accept-Encoding
vary_query_params:
- page
- limit
- sort
cache_control: true
10. Circuit Breaker and Canary Deployments
10.1 Circuit Breaker Pattern
Circuit Breaker States:
┌──────────┐ Failure rate exceeded ┌──────────┐
│ CLOSED │─────────────────────────▶│ OPEN │
│ (normal) │ │ (blocked)│
└──────────┘ └────┬─────┘
▲ │
│ After timeout│
│ Success ┌────────────┴─┐
└─────────────────────────│ HALF-OPEN │
│ (testing) │
└──────────────┘
│
Failure -> back to OPEN
# Envoy Circuit Breaker Configuration
clusters:
- name: order_service
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 512
max_requests: 2048
max_retries: 3
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
success_rate_minimum_hosts: 3
success_rate_request_volume: 100
10.2 Canary/A-B Routing
# Envoy Weighted Routing (Canary Deployment)
route_config:
virtual_hosts:
- name: api
domains: ["api.company.com"]
routes:
- match:
prefix: "/api/v1/products"
route:
weighted_clusters:
clusters:
- name: product_v1
weight: 90
- name: product_v2
weight: 10
# Header-based A/B routing
- match:
prefix: "/api/v1/checkout"
headers:
- name: "X-Feature-Flag"
exact_match: "new-checkout"
route:
cluster: checkout_v2
- match:
prefix: "/api/v1/checkout"
route:
cluster: checkout_v1
11. GraphQL Gateway
11.1 Apollo Router / Federation
GraphQL Federation Architecture:
[Client]
│
┌────┴────────────────────────┐
│ Apollo Router │
│ (Supergraph Gateway) │
│ │
│ ┌───────────────────────┐ │
│ │ Query Planner │ │
│ │ -> Which subgraphs │ │
│ │ get which queries │ │
│ └───────────────────────┘ │
└──────┬──────┬──────┬────────┘
│ │ │
┌────┴──┐ ┌┴────┐ ┌┴────────┐
│ User │ │Order│ │ Product │
│Sub- │ │Sub- │ │Subgraph │
│graph │ │graph│ │ │
└───────┘ └─────┘ └──────────┘
# Apollo Router Configuration
supergraph:
listen: 0.0.0.0:4000
traffic_shaping:
all:
timeout: 30s
subgraphs:
users:
timeout: 10s
orders:
timeout: 15s
limits:
max_depth: 15
max_height: 200
max_aliases: 30
max_root_fields: 20
telemetry:
exporters:
metrics:
prometheus:
enabled: true
listen: 0.0.0.0:9090
tracing:
otlp:
enabled: true
endpoint: http://otel-collector:4317
ratelimit:
global:
capacity: 1000
interval: 1m
12. API Versioning Strategies
12.1 Versioning Approaches Compared
3 API Versioning Strategies:
1. URL Path Versioning
GET /api/v1/users
GET /api/v2/users
-> Most intuitive, widely used
-> Cache-friendly
-> URL changes (breaking)
2. Header Versioning
GET /api/users
Accept: application/vnd.company.v2+json
-> Clean URLs
-> Hard to test in browser
3. Query Parameter Versioning
GET /api/users?version=2
-> Simple
-> Complex caching
-> Query parameter pollution
12.2 Gateway-Level Versioning
# Kong Route-based Versioning
services:
- name: user-service-v1
url: http://user-service-v1:8080
routes:
- name: users-v1
paths:
- /api/v1/users
strip_path: false
- name: user-service-v2
url: http://user-service-v2:8080
routes:
- name: users-v2
paths:
- /api/v2/users
strip_path: false
# Header-based versioning
- name: user-service-v2-header
url: http://user-service-v2:8080
routes:
- name: users-v2-header
paths:
- /api/users
headers:
X-API-Version:
- "2"
strip_path: false
13. Monitoring and Observability
13.1 Prometheus Metrics
# API Gateway Core Metrics
# 1. 4 Golden Signals
golden_signals:
latency:
- histogram: api_request_duration_seconds
labels: [method, route, status_code]
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
traffic:
- counter: api_requests_total
labels: [method, route, status_code, consumer]
errors:
- counter: api_errors_total
labels: [method, route, error_type]
saturation:
- gauge: api_active_connections
- gauge: api_rate_limit_remaining
# 2. Grafana Dashboard Queries
panels:
- title: "Request Rate (RPS)"
query: "sum(rate(api_requests_total[5m])) by (route)"
- title: "P99 Latency"
query: |
histogram_quantile(0.99,
sum(rate(api_request_duration_seconds_bucket[5m])) by (le, route)
)
- title: "Error Rate"
query: |
sum(rate(api_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(api_requests_total[5m])) * 100
13.2 Distributed Tracing
API Gateway Distributed Tracing Flow:
Request ID: abc-123-def
[Client] ─── [API Gateway] ──── [User Service] ──── [DB]
│ │ │ │
│ Span: req │ Span: gateway │ Span: user-svc│ Span: db-query
│ trace: abc │ trace: abc │ trace: abc │ trace: abc
│ span: s-1 │ span: s-2 │ span: s-3 │ span: s-4
│ parent: - │ parent: s-1 │ parent: s-2 │ parent: s-3
│ dur: 250ms │ dur: 200ms │ dur: 150ms │ dur: 50ms
│ │ │ │
│ Tags: │ Tags: │ Tags: │ Tags:
│ method:GET│ auth: jwt │ db: postgres │ stmt: SELECT
│ url: /usr │ cache: miss │ duration: 50ms│
14. Quiz
Q1. What are the three core API Gateway patterns?
Answer:
-
Routing Pattern: Forwards client requests to the correct backend service based on URL paths, headers, and other criteria.
-
Aggregation Pattern: Combines responses from multiple backend services into a single unified response for the client. Related to the BFF (Backend for Frontend) pattern.
-
Offloading Pattern: Moves cross-cutting concerns (authentication, SSL termination, caching, rate limiting) from individual services to the Gateway, allowing services to focus purely on business logic.
Q2. When is each gateway (Kong, Envoy, AWS API GW, Traefik) most appropriate?
Answer:
-
Kong: When you need a general-purpose API Gateway with a rich plugin ecosystem. DB-less mode support. Auth/rate-limiting/transformation available as plugins out of the box.
-
Envoy: For Service Mesh data planes or sidecar proxies. Dynamic configuration via xDS API, custom extensibility through WASM filters. Optimal when integrated with Istio.
-
AWS API Gateway: For AWS Lambda-based serverless architectures. No infrastructure management needed. Built-in Usage Plans and API Key management.
-
Traefik: When auto-discovery in Docker/Kubernetes environments is needed. Routing configured through labels/annotations only, with automatic Let's Encrypt certificate management.
Q3. What is the difference between Token Bucket and Sliding Window rate limiting?
Answer:
-
Token Bucket: Tokens are replenished at a fixed rate. Each request consumes a token. When the bucket is full, temporary bursts are allowed. Memory-efficient and simple to implement.
-
Sliding Window Log: Records the timestamp of each request and counts requests within the window from the current point in time. Eliminates the boundary problem of Fixed Window (2x burst at window edges), but memory usage is proportional to request count.
In practice, a Sliding Window Counter using Redis Sorted Sets is widely used as it provides a good balance between accuracy and memory efficiency.
Q4. What are the advantages of GraphQL Federation and the role of Apollo Router?
Answer:
GraphQL Federation allows multiple services to define their own GraphQL schemas (subgraphs), which are composed into a single unified schema (supergraph).
Advantages:
- Service autonomy: Each team independently manages their domain schema
- Single endpoint: Clients use one GraphQL endpoint
- Type extension: Services can extend types across boundaries
Apollo Router role:
- Analyzes client queries and creates a Query Plan determining which subgraphs receive which queries
- Automatically merges responses from subgraphs and returns them to the client
- Provides gateway features including rate limiting, authentication, tracing, and caching
Q5. What are the 4 Golden Signals to monitor at an API Gateway?
Answer: The 4 Golden Signals proposed by Google SRE:
-
Latency: Time taken to process requests. Tracking P50, P95, P99 percentiles is critical.
-
Traffic: Requests per second (RPS). Understand traffic patterns per endpoint and per consumer.
-
Errors: Error ratio relative to total requests. Distinguish between 5xx server errors and 4xx client errors.
-
Saturation: System resource utilization. Includes concurrent connections, rate limit remaining capacity, and circuit breaker state.
15. References
- Kong Documentation
- Envoy Proxy Documentation
- AWS API Gateway Developer Guide
- Traefik Documentation
- Apollo Router Documentation
- GraphQL Federation Specification
- NGINX Rate Limiting
- Token Bucket Algorithm (Wikipedia)
- Envoy xDS Protocol
- Google SRE Book - Monitoring Distributed Systems
- OpenTelemetry Documentation
- Istio Service Mesh
- RFC 6585 - Additional HTTP Status Codes (429)
- Microsoft API Design Best Practices