Skip to content
Published on

Complete Guide to API Gateway Pattern: Rate Limiting, Authentication, and BFF Architecture Design

Authors
  • Name
    Twitter
API Gateway Pattern

Introduction

As microservices architectures proliferate, it becomes impractical for clients to communicate directly with dozens or hundreds of backend services. An API Gateway serves as a single entry point between clients and backend services, centralizing cross-cutting concerns such as routing, authentication/authorization, rate limiting, load balancing, and protocol translation.

This article covers the core concepts of the API Gateway pattern, followed by an in-depth comparison of Rate Limiting algorithms (Token Bucket, Sliding Window, Fixed Window, Leaky Bucket), JWT/OAuth2-based authentication strategies, BFF (Backend for Frontend) architecture design, load balancing and circuit breaker configurations, and production implementation examples using Kong and Apache APISIX. We conclude with real-world failure scenarios and an operational checklist for production environments.

API Gateway Pattern Overview

Roles of an API Gateway

An API Gateway centralizes the following cross-cutting concerns:

  • Routing: Forwards requests to appropriate backend services based on URL, headers, and methods
  • Authentication/Authorization: JWT validation, OAuth2 token verification, API key management
  • Rate Limiting: Per-client and per-API request rate restrictions
  • Load Balancing: Traffic distribution using round-robin, weighted, or least-connections algorithms
  • Circuit Breaking: Automatically blocks requests to failing services to prevent cascading failures
  • Protocol Translation: REST to gRPC, HTTP to WebSocket conversions
  • Caching: Response caching for improved performance
  • Monitoring: Metrics collection, distributed tracing, and logging

API Gateway Solution Comparison

FeatureKongApache APISIXAWS API GatewayEnvoy
Core TechnologyNGINX + LuaNGINX + etcdAWS ManagedC++
Performance (QPS)~10,000+~23,000+Managed (with limits)~15,000+
Plugin EcosystemVery rich (100+)Rich (80+)LimitedRich (filter chains)
Config StorePostgreSQL / CassandraetcdAWS InternalxDS API
Dynamic ConfigAdmin APIAdmin API + etcd WatchConsole/CLIxDS Hot Reload
Service MeshKong Mesh (Kuma)Amesh (Istio integration)App MeshIstio default proxy
Kubernetes NativeKong Ingress ControllerAPISIX Ingress ControllerNone (EKS integration)Gateway API support
LicenseApache 2.0 / EnterpriseApache 2.0Pay-per-useApache 2.0
Best ForGeneral purpose, enterpriseHigh performance, dynamic routingAWS native workloadsK8s, service mesh

API Gateway vs Service Mesh

API Gateways and service meshes are complementary technologies.

AspectAPI GatewayService Mesh
PositionBetween clients and services (north-south traffic)Between services (east-west traffic)
Primary RoleExternal request routing, auth, rate limitingInter-service mTLS, traffic management, observability
DeploymentCentralized (gateway cluster)Distributed (sidecar proxies)
ProtocolsHTTP, gRPC, WebSocketTCP, HTTP, gRPC
SolutionsKong, APISIX, AWS API GWIstio, Linkerd, Consul Connect

Rate Limiting Algorithms

Rate limiting is one of the most critical API Gateway features, essential for preventing service overload, defending against DDoS attacks, and ensuring fair resource distribution.

Algorithm Comparison

AlgorithmPrincipleBurst AllowedMemory UsageAccuracyComplexity
Fixed WindowCounter within fixed time window2x possible at boundaryLowLowVery Low
Sliding Window LogTimestamp recorded per requestNoneHighHighMedium
Sliding Window CounterWeighted average of prev/current windowsMinimizedLowMediumMedium
Token BucketTokens refilled at steady rate, consumed per requestYes (up to bucket size)LowMediumLow
Leaky BucketRequests processed at fixed rate, excess queuedNone (fixed rate)LowHighLow

Token Bucket Algorithm

The Token Bucket algorithm is the most practical approach, allowing burst traffic while constraining the average request rate.

# Kong - Rate Limiting Plugin Configuration (Token Bucket based)
# kong.yml - Declarative Configuration
_format_version: '3.0'

services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: user-route
        paths:
          - /api/v1/users
    plugins:
      - name: rate-limiting
        config:
          # 100 requests per minute, 1000 per hour
          minute: 100
          hour: 1000
          # Policy: local (single node), cluster (cluster-wide), redis (Redis-based)
          policy: redis
          redis:
            host: redis-cluster
            port: 6379
            password: null
            database: 0
            timeout: 2000
          # Return rate limit headers
          header_name: null
          hide_client_headers: false
          # Limit key: consumer, credential, ip, header, path, service
          limit_by: consumer
          # Allow requests when Redis is down
          fault_tolerant: true

Sliding Window Algorithm

The Sliding Window Counter resolves the boundary problem of Fixed Windows while maintaining memory efficiency.

-- APISIX Custom Rate Limiting Plugin (Sliding Window Counter)
-- apisix/plugins/sliding-window-rate-limit.lua

local core = require("apisix.core")
local ngx = ngx
local math = math

local schema = {
    type = "object",
    properties = {
        rate = { type = "integer", minimum = 1 },
        burst = { type = "integer", minimum = 0 },
        window_size = { type = "integer", minimum = 1, default = 60 },
        key_type = {
            type = "string",
            enum = { "remote_addr", "consumer_name", "header" },
            default = "remote_addr"
        },
    },
    required = { "rate" },
}

local _M = {
    version = 0.1,
    priority = 1001,
    name = "sliding-window-rate-limit",
    schema = schema,
}

function _M.access(conf, ctx)
    local key = ctx.var.remote_addr
    if conf.key_type == "consumer_name" then
        key = ctx.consumer_name or ctx.var.remote_addr
    end

    local now = ngx.now()
    local window = conf.window_size
    local current_window = math.floor(now / window) * window
    local previous_window = current_window - window
    local elapsed = now - current_window

    -- Calculate weighted average of previous and current windows
    local prev_count = get_count(key, previous_window) or 0
    local curr_count = get_count(key, current_window) or 0
    local weight = (window - elapsed) / window
    local estimated = prev_count * weight + curr_count

    if estimated >= conf.rate then
        return 429, {
            error = "Rate limit exceeded",
            retry_after = math.ceil(window - elapsed)
        }
    end

    increment_count(key, current_window)
end

return _M

Authentication and Authorization Strategies

Centralizing authentication at the API Gateway significantly reduces the security burden on backend services.

JWT Authentication Setup

# APISIX - JWT Authentication Plugin Configuration
# apisix/conf/config.yaml
routes:
  - uri: /api/v1/orders/*
    upstream:
      type: roundrobin
      nodes:
        'order-service:8080': 1
    plugins:
      jwt-auth:
        # Public key for JWT signature verification
        key: 'user-auth-key'
        # Token location configuration
        header: 'Authorization'
        query: 'token'
        cookie: 'jwt_token'
      # Additional: Role-based access control
      consumer-restriction:
        type: consumer_group_id
        whitelist:
          - 'premium-users'
          - 'admin-group'
        rejected_code: 403
        rejected_msg: 'Access denied: insufficient permissions'

# Consumer configuration (API user definitions)
consumers:
  - username: 'mobile-app'
    plugins:
      jwt-auth:
        key: 'mobile-app-key'
        secret: 'mobile-app-secret-256bit-key-here'
        algorithm: 'HS256'
        exp: 86400 # Token expiry: 24 hours
        base64_secret: false
  - username: 'web-frontend'
    plugins:
      jwt-auth:
        key: 'web-frontend-key'
        # Public key path for RS256
        public_key: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
          -----END PUBLIC KEY-----
        algorithm: 'RS256'
        exp: 3600 # Token expiry: 1 hour

OAuth2 + OIDC Integrated Authentication Flow

Integrating OAuth2/OIDC at the API Gateway centralizes IdP (Identity Provider) connectivity.

# Kong - OpenID Connect Plugin Configuration
plugins:
  - name: openid-connect
    config:
      issuer: 'https://auth.example.com/realms/production'
      client_id: 'api-gateway'
      client_secret: 'gateway-secret-value'
      redirect_uri: 'https://api.example.com/callback'
      # Supported authentication flows
      auth_methods:
        - authorization_code # Web applications
        - client_credentials # Service-to-service
        - password # Legacy support (not recommended)
      # Token validation settings
      token_endpoint_auth_method: client_secret_post
      # Scope-based access control
      scopes_required:
        - openid
        - profile
        - api:read
      # Token caching (performance optimization)
      cache_ttl: 300
      # Token introspection (opaque token verification)
      introspection_endpoint: 'https://auth.example.com/realms/production/protocol/openid-connect/token/introspect'
      # Headers forwarded to upstream
      upstream_headers_claims:
        - sub
        - email
        - realm_access.roles
      upstream_headers_names:
        - X-User-ID
        - X-User-Email
        - X-User-Roles

BFF (Backend for Frontend) Architecture

Why BFF Pattern Is Needed

Serving all clients (web, mobile, IoT) through a single API Gateway introduces several problems:

  • Excessive data transfer: Full web-optimized payloads sent to mobile clients
  • Complex gateway logic: Client-specific branching logic accumulates in the gateway
  • Deployment coupling: Changes for one client type affect others

The BFF pattern provides dedicated, optimized backends for each frontend type, solving these problems.

BFF Routing Configuration

# APISIX - BFF Routing Configuration
# Route to dedicated BFFs based on client type
routes:
  # Web BFF - Rich data, detailed information
  - uri: /api/web/*
    name: web-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/web/(.*)'
          - '/$1'
      request-id:
        header_name: X-Request-ID
      jwt-auth: {}
      rate-limiting:
        rate: 200
        burst: 50
        key_type: consumer_name
    upstream:
      type: roundrobin
      nodes:
        'web-bff:3000': 1
      timeout:
        connect: 3
        send: 10
        read: 30

  # Mobile BFF - Lightweight data, pagination optimized
  - uri: /api/mobile/*
    name: mobile-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/mobile/(.*)'
          - '/$1'
      jwt-auth: {}
      rate-limiting:
        rate: 100
        burst: 20
        key_type: consumer_name
      # Mobile-specific: response size control
      response-rewrite:
        headers:
          set:
            X-Content-Optimized: 'mobile'
    upstream:
      type: roundrobin
      nodes:
        'mobile-bff:3001': 1
      timeout:
        connect: 3
        send: 5
        read: 15

  # IoT BFF - Minimal data, high frequency
  - uri: /api/iot/*
    name: iot-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/iot/(.*)'
          - '/$1'
      key-auth: {} # IoT devices use API key authentication
      rate-limiting:
        rate: 500
        burst: 100
        key_type: var
        key: remote_addr
    upstream:
      type: roundrobin
      nodes:
        'iot-bff:3002': 1
      timeout:
        connect: 2
        send: 3
        read: 5

BFF Architecture Diagram

Client Layer             API Gateway          BFF Layer          Microservices
+----------+                              +----------+
| Web App  | ----+                   +--> | Web BFF  | --+--> User Service
+----------+     |    +-----------+  |    +----------+   +--> Product Service
                 +--> |           |--+                   +--> Order Service
+----------+     |    | API       |  |    +----------+
|Mobile App| ----+--> | Gateway   |--+--> |Mobile BFF| --+--> User Service
+----------+     |    |           |  |    +----------+   +--> Product Service
                 |    +-----------+  |
+----------+     |                   |    +----------+
|IoT Device| ----+                   +--> | IoT BFF  | --+--> Device Service
+----------+                              +----------+   +--> Telemetry Service

Load Balancing and Circuit Breakers

Load Balancing Strategies

API Gateways support various load balancing algorithms.

# APISIX - Load Balancing Strategies
upstreams:
  # Weighted Round Robin
  - id: 1
    type: roundrobin
    nodes:
      'service-a-v1:8080': 8 # 80% traffic
      'service-a-v2:8080': 2 # 20% traffic (canary deployment)
    # Health check configuration
    checks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5
          successes: 2
        unhealthy:
          interval: 3
          http_failures: 3
          tcp_failures: 3
      passive:
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 3
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          http_failures: 5
          tcp_failures: 2

  # Consistent Hashing (Session Affinity)
  - id: 2
    type: chash
    key: remote_addr
    nodes:
      'session-service-1:8080': 1
      'session-service-2:8080': 1
      'session-service-3:8080': 1

  # Least Connections
  - id: 3
    type: least_conn
    nodes:
      'compute-service-1:8080': 1
      'compute-service-2:8080': 1

Circuit Breaker Configuration

# APISIX - api-breaker Plugin (Automatic Circuit Breaker)
routes:
  - uri: /api/v1/payments/*
    plugins:
      api-breaker:
        # Circuit breaker trigger status codes
        break_response_code: 503
        break_response_body: '{"error":"circuit open","retry_after":30}'
        break_response_headers:
          - key: Content-Type
            value: application/json
          - key: Retry-After
            value: '30'
        # Unhealthy: circuit opens after 3 consecutive 500 errors
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          failures: 3
        # Healthy: circuit closes after 2 consecutive successes
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 2
        # Maximum wait time after circuit opens (seconds)
        max_breaker_sec: 300
    upstream:
      type: roundrobin
      nodes:
        'payment-service:8080': 1

Production Deployment with Kong

Docker Compose Kong Cluster

# docker-compose.kong.yml
version: '3.8'

services:
  kong-database:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: kong
      POSTGRES_USER: kong
      POSTGRES_PASSWORD: kong_password
    volumes:
      - kong_pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ['CMD', 'pg_isready', '-U', 'kong']
      interval: 10s
      timeout: 5s
      retries: 5

  kong-migration:
    image: kong:3.6
    command: kong migrations bootstrap
    depends_on:
      kong-database:
        condition: service_healthy
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password

  kong:
    image: kong:3.6
    depends_on:
      kong-migration:
        condition: service_completed_successfully
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: '0.0.0.0:8001'
      KONG_STATUS_LISTEN: '0.0.0.0:8100'
      # Performance tuning
      KONG_NGINX_WORKER_PROCESSES: auto
      KONG_UPSTREAM_KEEPALIVE_POOL_SIZE: 128
      KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS: 1000
    ports:
      - '8000:8000' # Proxy (HTTP)
      - '8443:8443' # Proxy (HTTPS)
      - '8001:8001' # Admin API
    healthcheck:
      test: ['CMD', 'kong', 'health']
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  kong_pgdata:

Production Deployment with APISIX

APISIX Helm-based Kubernetes Deployment

# APISIX Kubernetes Deployment (Helm)
helm repo add apisix https://charts.apiseven.com
helm repo update

# Install APISIX (with etcd)
helm install apisix apisix/apisix \
  --namespace apisix \
  --create-namespace \
  --set gateway.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set dashboard.enabled=true \
  --set etcd.replicaCount=3 \
  --set etcd.persistence.size=20Gi \
  --set apisix.nginx.workerProcesses=auto \
  --set apisix.nginx.workerConnections=65536

# Verify APISIX status
kubectl -n apisix get pods
kubectl -n apisix get svc

# Register route via Admin API
curl -X PUT http://apisix-admin:9180/apisix/admin/routes/1 \
  -H "X-API-KEY: admin-api-key" \
  -d '{
    "uri": "/api/v1/products/*",
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "product-service.default.svc:8080": 1
      }
    },
    "plugins": {
      "jwt-auth": {},
      "limit-count": {
        "count": 200,
        "time_window": 60,
        "rejected_code": 429,
        "rejected_msg": "Rate limit exceeded. Please retry later.",
        "policy": "redis",
        "redis_host": "redis.default.svc",
        "redis_port": 6379,
        "key_type": "var",
        "key": "consumer_name"
      },
      "api-breaker": {
        "break_response_code": 503,
        "unhealthy": {
          "http_statuses": [500, 502, 503],
          "failures": 3
        },
        "healthy": {
          "http_statuses": [200],
          "successes": 2
        },
        "max_breaker_sec": 60
      }
    }
  }'

Monitoring and Operations

Prometheus + Grafana Metrics Collection

Key API Gateway monitoring metrics include:

  • Request Rate: Requests processed per second
  • Error Rate: Percentage of 4xx/5xx responses
  • Latency: P50, P95, P99 response times
  • Rate Limit Hit Rate: Percentage of requests reaching limits
  • Circuit Breaker State: Open/Closed/Half-Open transition events
  • Upstream Health: Backend service availability
# APISIX - Prometheus Metrics Configuration
plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics
    export_addr:
      ip: '0.0.0.0'
      port: 9091
    default_buckets:
      - 0.005
      - 0.01
      - 0.025
      - 0.05
      - 0.1
      - 0.25
      - 0.5
      - 1
      - 2.5
      - 5
      - 10

# Global plugin applied to all routes
global_rules:
  - id: 1
    plugins:
      prometheus:
        prefer_name: true
      # Distributed tracing (OpenTelemetry)
      opentelemetry:
        sampler:
          name: parent_based_traceidratio
          options:
            fraction: 0.1 # 10% sampling
        additional_attributes:
          - 'service.version'

Failure Cases and Remediation

Case 1: Rate Limiter Misconfiguration Causing Outage

A fintech company configured their rate limiter with a local policy while scaling the API Gateway to 3 nodes. Each node independently applied rate limits, effectively allowing 3x the configured traffic to reach backends. The payment service went down due to overload.

Remediation: Always use redis or cluster policies in distributed environments. Redis Cluster as the rate limit store ensures consistent limits regardless of gateway node count.

Case 2: API Gateway Single Point of Failure

An API Gateway running as a single instance experienced OOM (Out of Memory) due to a memory leak, causing a complete service outage.

Remediation: Always deploy API Gateways in HA (High Availability) configuration. Deploy at least 2 instances in Active-Active mode with an L4 load balancer (AWS NLB, MetalLB) in front. Use health checks to automatically remove failing nodes.

Case 3: Token Caching Leading to Privilege Escalation

JWT tokens were cached for 5 minutes at the API Gateway. When a user's permissions were revoked or an account was deactivated, the cached token continued to grant access.

Remediation: Keep token cache TTL short (30 seconds to 1 minute). Use token blacklists for critical permission changes. Always validate the exp claim at the gateway and implement token revocation endpoints.

Case 4: Missing Circuit Breaker Causing Cascading Failures

An external payment API experienced response latency exceeding 60 seconds. Without circuit breakers, all API Gateway worker processes became occupied waiting for the payment service. As a result, even healthy APIs became unresponsive.

Remediation: Configure appropriate timeouts and circuit breakers for all upstreams. Set connection timeout to 3 seconds and read timeout to 5-30 seconds depending on API characteristics. Open the circuit after 3-5 consecutive failures and transition to Half-Open state after 30-60 seconds for gradual recovery.

Operational Checklist

Essential items to verify when operating an API Gateway in production.

Deployment and Availability

  • HA configuration (minimum 2 instances, Active-Active)
  • L4 load balancer in front (AWS NLB, MetalLB, etc.)
  • Rolling update or blue-green deployment strategy
  • Config store backup (PostgreSQL, etcd)

Security

  • Admin API access restricted to internal network only
  • TLS 1.3 with automatic certificate renewal
  • JWT token validation enabled with minimal cache TTL
  • CORS and CSRF protection configured

Rate Limiting

  • Distributed policy (redis or cluster)
  • Differentiated limits per client type
  • Rate limit headers returned (X-RateLimit-Limit, X-RateLimit-Remaining)
  • fault_tolerant setting for Redis failures

Monitoring

  • Prometheus metrics collection enabled
  • P99 latency, error rate, rate limit hit rate dashboards
  • Circuit breaker state change alerts
  • Distributed tracing (OpenTelemetry) integration

Performance

  • Worker process count optimized (based on CPU cores)
  • Upstream keepalive connection pool configured
  • Response caching strategy applied
  • Unnecessary plugins disabled

References