Split View: API Gateway 패턴 완벽 가이드: Rate Limiting, 인증/인가, BFF 아키텍처 설계

API Gateway 패턴 완벽 가이드: Rate Limiting, 인증/인가, BFF 아키텍처 설계

들어가며
API Gateway 패턴 개요
Rate Limiting 알고리즘
인증/인가 전략
- JWT 인증 설정
- OAuth2 + OIDC 통합 인증 흐름
BFF (Backend for Frontend) 아키텍처
로드밸런싱과 서킷 브레이커
- 로드밸런싱 전략
- 서킷 브레이커 설정
Kong 기반 프로덕션 구현
- Docker Compose 기반 Kong 클러스터 구성
APISIX 기반 프로덕션 구현
- APISIX Helm 기반 Kubernetes 배포
모니터링과 운영
- Prometheus + Grafana 메트릭 수집
- 핵심 알림 규칙
실패 사례와 대응
운영 체크리스트
참고자료

들어가며

마이크로서비스 아키텍처가 보편화되면서 클라이언트가 수십, 수백 개의 서비스와 직접 통신하는 것은 현실적으로 불가능해졌다. API Gateway는 클라이언트와 백엔드 서비스 사이에 위치하는 단일 진입점으로서, 라우팅, 인증/인가, Rate Limiting, 로드밸런싱, 프로토콜 변환 등의 횡단 관심사(cross-cutting concerns)를 중앙에서 처리한다.

이 글에서는 API Gateway 패턴의 핵심 개념부터 시작하여, Rate Limiting 알고리즘(Token Bucket, Sliding Window, Fixed Window, Leaky Bucket)의 동작 원리와 비교, JWT/OAuth2 기반 인증/인가 전략, BFF(Backend for Frontend) 아키텍처 설계, 로드밸런싱과 서킷 브레이커 구성, 그리고 Kong과 Apache APISIX 기반의 프로덕션 구현 예제를 다룬다. 마지막으로 프로덕션 환경에서 실제로 겪을 수 있는 장애 시나리오와 운영 체크리스트를 정리한다.

API Gateway 패턴 개요

API Gateway의 역할

API Gateway는 다음과 같은 횡단 관심사를 중앙에서 처리한다.

라우팅: 요청 URL, 헤더, 메서드 기반으로 적절한 백엔드 서비스로 전달
인증/인가: JWT 검증, OAuth2 토큰 유효성 검사, API Key 관리
Rate Limiting: 클라이언트별, API별 요청 속도 제한
로드밸런싱: 라운드 로빈, 가중치, 최소 연결 등의 알고리즘으로 트래픽 분산
서킷 브레이커: 장애 서비스로의 요청을 자동 차단하여 연쇄 장애 방지
프로토콜 변환: REST to gRPC, HTTP to WebSocket 등
캐싱: 응답 캐싱을 통한 성능 향상
모니터링: 메트릭 수집, 분산 추적, 로깅

API Gateway 솔루션 비교

항목	Kong	Apache APISIX	AWS API Gateway	Envoy
기반 기술	NGINX + Lua	NGINX + etcd	AWS 관리형	C++
성능 (QPS)	약 10,000+	약 23,000+	관리형 (제한 있음)	약 15,000+
플러그인 생태계	매우 풍부 (100+)	풍부 (80+)	제한적	풍부 (필터 체인)
구성 저장소	PostgreSQL / Cassandra	etcd	AWS 내부	xDS API
동적 설정 변경	Admin API	Admin API + etcd Watch	콘솔/CLI	xDS 핫 리로드
서비스 메시	Kong Mesh (Kuma)	Amesh (Istio 연동)	App Mesh 연동	Istio 기본 프록시
Kubernetes 네이티브	Kong Ingress Controller	APISIX Ingress Controller	없음 (EKS 연동)	Gateway API 지원
라이선스	Apache 2.0 / Enterprise	Apache 2.0	종량제	Apache 2.0
적합 환경	범용, 엔터프라이즈	고성능, 동적 라우팅	AWS 네이티브	K8s, 서비스 메시

API Gateway vs 서비스 메시

API Gateway와 서비스 메시는 보완적인 관계이다.

구분	API Gateway	서비스 메시
위치	클라이언트와 서비스 사이 (남북 트래픽)	서비스와 서비스 사이 (동서 트래픽)
주요 역할	외부 요청 라우팅, 인증, Rate Limiting	서비스 간 mTLS, 트래픽 관리, 관측성
배포 방식	중앙 집중형 (게이트웨이 클러스터)	분산형 (사이드카 프록시)
프로토콜	HTTP, gRPC, WebSocket	TCP, HTTP, gRPC
대표 솔루션	Kong, APISIX, AWS API GW	Istio, Linkerd, Consul Connect

Rate Limiting 알고리즘

Rate Limiting은 API Gateway의 가장 중요한 기능 중 하나다. 서비스 과부하 방지, DDoS 방어, 공정한 리소스 분배를 위해 필수적이다.

알고리즘 비교

알고리즘	원리	버스트 허용	메모리 사용	정확도	구현 복잡도
Fixed Window	고정 시간 윈도우 내 카운터	경계에서 2배 가능	낮음	낮음	매우 낮음
Sliding Window Log	각 요청 타임스탬프 기록	없음	높음	높음	중간
Sliding Window Counter	이전/현재 윈도우 가중 평균	최소화	낮음	중간	중간
Token Bucket	일정 속도로 토큰 충전, 요청 시 소모	허용 (버킷 크기만큼)	낮음	중간	낮음
Leaky Bucket	고정 속도로 요청 처리, 초과분 큐잉	없음 (고정 속도)	낮음	높음	낮음

Token Bucket 알고리즘

Token Bucket은 버스트 트래픽을 허용하면서도 평균 요청률을 제한하는 가장 실용적인 알고리즘이다.

# Kong - Rate Limiting 플러그인 설정 (Token Bucket 기반)
# kong.yml - Declarative Configuration
_format_version: '3.0'

services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: user-route
        paths:
          - /api/v1/users
    plugins:
      - name: rate-limiting
        config:
          # 분당 100회, 시간당 1000회 제한
          minute: 100
          hour: 1000
          # 정책: local(단일 노드), cluster(클러스터 전체), redis(Redis 기반)
          policy: redis
          redis:
            host: redis-cluster
            port: 6379
            password: null
            database: 0
            timeout: 2000
          # Rate Limit 헤더 반환
          header_name: null
          hide_client_headers: false
          # 제한 기준: consumer, credential, ip, header, path, service
          limit_by: consumer
          # Redis 장애 시 요청 허용 여부
          fault_tolerant: true

Sliding Window 알고리즘

Sliding Window Counter는 Fixed Window의 경계 문제를 해결하면서도 메모리 효율이 좋다.

-- APISIX 커스텀 Rate Limiting 플러그인 (Sliding Window Counter)
-- apisix/plugins/sliding-window-rate-limit.lua

local core = require("apisix.core")
local ngx = ngx
local math = math

local schema = {
    type = "object",
    properties = {
        rate = { type = "integer", minimum = 1 },
        burst = { type = "integer", minimum = 0 },
        window_size = { type = "integer", minimum = 1, default = 60 },
        key_type = {
            type = "string",
            enum = { "remote_addr", "consumer_name", "header" },
            default = "remote_addr"
        },
    },
    required = { "rate" },
}

local _M = {
    version = 0.1,
    priority = 1001,
    name = "sliding-window-rate-limit",
    schema = schema,
}

function _M.access(conf, ctx)
    local key = ctx.var.remote_addr
    if conf.key_type == "consumer_name" then
        key = ctx.consumer_name or ctx.var.remote_addr
    end

    local now = ngx.now()
    local window = conf.window_size
    local current_window = math.floor(now / window) * window
    local previous_window = current_window - window
    local elapsed = now - current_window

    -- 이전 윈도우와 현재 윈도우의 가중 평균 계산
    local prev_count = get_count(key, previous_window) or 0
    local curr_count = get_count(key, current_window) or 0
    local weight = (window - elapsed) / window
    local estimated = prev_count * weight + curr_count

    if estimated >= conf.rate then
        return 429, {
            error = "Rate limit exceeded",
            retry_after = math.ceil(window - elapsed)
        }
    end

    increment_count(key, current_window)
end

return _M

인증/인가 전략

API Gateway에서의 인증/인가는 백엔드 서비스의 보안 부담을 크게 줄여준다.

JWT 인증 설정

# APISIX - JWT 인증 플러그인 설정
# apisix/conf/config.yaml
routes:
  - uri: /api/v1/orders/*
    upstream:
      type: roundrobin
      nodes:
        'order-service:8080': 1
    plugins:
      jwt-auth:
        # JWT 서명 검증을 위한 공개 키
        key: 'user-auth-key'
        # 토큰 위치 설정
        header: 'Authorization'
        # 토큰 전달 방식: bearer 스킴
        query: 'token'
        cookie: 'jwt_token'
      # 추가: 권한 기반 접근 제어
      consumer-restriction:
        type: consumer_group_id
        whitelist:
          - 'premium-users'
          - 'admin-group'
        rejected_code: 403
        rejected_msg: 'Access denied: insufficient permissions'

# Consumer 설정 (API 사용자 정의)
consumers:
  - username: 'mobile-app'
    plugins:
      jwt-auth:
        key: 'mobile-app-key'
        secret: 'mobile-app-secret-256bit-key-here'
        algorithm: 'HS256'
        exp: 86400 # 토큰 만료: 24시간
        # 커스텀 클레임 기반 라우팅
        base64_secret: false
  - username: 'web-frontend'
    plugins:
      jwt-auth:
        key: 'web-frontend-key'
        # RS256 사용 시 공개키 경로
        public_key: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
          -----END PUBLIC KEY-----
        algorithm: 'RS256'
        exp: 3600 # 토큰 만료: 1시간

OAuth2 + OIDC 통합 인증 흐름

API Gateway에서 OAuth2/OIDC를 통합하면 IdP(Identity Provider)와의 연동을 중앙화할 수 있다.

# Kong - OpenID Connect 플러그인 설정
plugins:
  - name: openid-connect
    config:
      issuer: 'https://auth.example.com/realms/production'
      client_id: 'api-gateway'
      client_secret: 'gateway-secret-value'
      redirect_uri: 'https://api.example.com/callback'
      # 지원 인증 흐름
      auth_methods:
        - authorization_code # 웹 애플리케이션
        - client_credentials # 서비스 간 통신
        - password # 레거시 지원 (권장하지 않음)
      # 토큰 검증 설정
      token_endpoint_auth_method: client_secret_post
      # 스코프 기반 접근 제어
      scopes_required:
        - openid
        - profile
        - api:read
      # 토큰 캐싱 (성능 최적화)
      cache_ttl: 300
      # 토큰 인트로스펙션 (불투명 토큰 검증)
      introspection_endpoint: 'https://auth.example.com/realms/production/protocol/openid-connect/token/introspect'
      # 업스트림으로 전달할 헤더
      upstream_headers_claims:
        - sub
        - email
        - realm_access.roles
      upstream_headers_names:
        - X-User-ID
        - X-User-Email
        - X-User-Roles

BFF (Backend for Frontend) 아키텍처

BFF 패턴이 필요한 이유

단일 API Gateway로 모든 클라이언트(웹, 모바일, IoT 등)를 서비스하면 다음과 같은 문제가 발생한다.

과도한 데이터 전송: 모바일 클라이언트에 웹용 전체 데이터가 전달됨
복잡한 게이트웨이 로직: 클라이언트별 분기 로직이 게이트웨이에 누적됨
배포 결합: 하나의 클라이언트를 위한 변경이 다른 클라이언트에 영향

BFF 패턴은 각 프론트엔드에 최적화된 전용 백엔드를 제공하여 이러한 문제를 해결한다.

BFF 라우팅 구성

# APISIX - BFF 라우팅 설정
# 클라이언트 유형별 전용 BFF로 라우팅
routes:
  # 웹 BFF - 풍부한 데이터, 상세 정보 포함
  - uri: /api/web/*
    name: web-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/web/(.*)'
          - '/$1'
      request-id:
        header_name: X-Request-ID
      jwt-auth: {}
      rate-limiting:
        rate: 200
        burst: 50
        key_type: consumer_name
    upstream:
      type: roundrobin
      nodes:
        'web-bff:3000': 1
      timeout:
        connect: 3
        send: 10
        read: 30

  # 모바일 BFF - 경량 데이터, 페이지네이션 최적화
  - uri: /api/mobile/*
    name: mobile-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/mobile/(.*)'
          - '/$1'
      jwt-auth: {}
      rate-limiting:
        rate: 100
        burst: 20
        key_type: consumer_name
      # 모바일 전용: 응답 크기 제한
      response-rewrite:
        headers:
          set:
            X-Content-Optimized: 'mobile'
    upstream:
      type: roundrobin
      nodes:
        'mobile-bff:3001': 1
      timeout:
        connect: 3
        send: 5
        read: 15

  # IoT BFF - 최소 데이터, 높은 빈도
  - uri: /api/iot/*
    name: iot-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/iot/(.*)'
          - '/$1'
      key-auth: {} # IoT 디바이스는 API Key 인증
      rate-limiting:
        rate: 500
        burst: 100
        key_type: var
        key: remote_addr
    upstream:
      type: roundrobin
      nodes:
        'iot-bff:3002': 1
      timeout:
        connect: 2
        send: 3
        read: 5

BFF 아키텍처 구조

클라이언트 계층          API Gateway          BFF 계층           마이크로서비스
+----------+                              +----------+
| 웹 앱    | ----+                   +--> | Web BFF  | --+--> User Service
+----------+     |    +-----------+  |    +----------+   +--> Product Service
                 +--> |           |--+                   +--> Order Service
+----------+     |    | API       |  |    +----------+
| 모바일 앱| ----+--> | Gateway   |--+--> |Mobile BFF| --+--> User Service
+----------+     |    |           |  |    +----------+   +--> Product Service
                 |    +-----------+  |
+----------+     |                   |    +----------+
| IoT 장치 | ----+                   +--> | IoT BFF  | --+--> Device Service
+----------+                              +----------+   +--> Telemetry Service

로드밸런싱과 서킷 브레이커

로드밸런싱 전략

API Gateway는 다양한 로드밸런싱 알고리즘을 지원한다.

# APISIX - 다양한 로드밸런싱 전략
upstreams:
  # 가중 라운드 로빈
  - id: 1
    type: roundrobin
    nodes:
      'service-a-v1:8080': 8 # 80% 트래픽
      'service-a-v2:8080': 2 # 20% 트래픽 (카나리 배포)
    # 헬스체크 설정
    checks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5
          successes: 2
        unhealthy:
          interval: 3
          http_failures: 3
          tcp_failures: 3
      passive:
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 3
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          http_failures: 5
          tcp_failures: 2

  # 일관적 해시 (세션 어피니티)
  - id: 2
    type: chash
    key: remote_addr
    nodes:
      'session-service-1:8080': 1
      'session-service-2:8080': 1
      'session-service-3:8080': 1

  # 최소 연결
  - id: 3
    type: least_conn
    nodes:
      'compute-service-1:8080': 1
      'compute-service-2:8080': 1

서킷 브레이커 설정

# Kong - 서킷 브레이커 (Circuit Breaker) 구성
plugins:
  - name: ai-proxy # Kong의 업스트림 타임아웃과 결합
    # 서킷 브레이커 역할을 하는 설정
services:
  - name: payment-service
    url: http://payment-service:8080
    connect_timeout: 3000 # 연결 타임아웃: 3초
    write_timeout: 10000 # 쓰기 타임아웃: 10초
    read_timeout: 15000 # 읽기 타임아웃: 15초
    retries: 2 # 재시도 횟수
    plugins:
      # 서킷 브레이커 패턴 구현
      - name: request-termination
        enabled: false # 수동 서킷 브레이커 (장애 시 활성화)
        config:
          status_code: 503
          message: 'Service temporarily unavailable'

---
# APISIX - api-breaker 플러그인 (자동 서킷 브레이커)
routes:
  - uri: /api/v1/payments/*
    plugins:
      api-breaker:
        # 서킷 브레이커 트리거 상태 코드
        break_response_code: 503
        break_response_body: '{"error":"circuit open","retry_after":30}'
        break_response_headers:
          - key: Content-Type
            value: application/json
          - key: Retry-After
            value: '30'
        # unhealthy 판정: 연속 3회 500 에러 시 서킷 오픈
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          failures: 3
        # healthy 판정: 연속 2회 성공 시 서킷 클로즈
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 2
        # 서킷 오픈 후 최대 대기 시간 (초)
        max_breaker_sec: 300
    upstream:
      type: roundrobin
      nodes:
        'payment-service:8080': 1

Kong 기반 프로덕션 구현

Docker Compose 기반 Kong 클러스터 구성

# docker-compose.kong.yml
version: '3.8'

services:
  kong-database:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: kong
      POSTGRES_USER: kong
      POSTGRES_PASSWORD: kong_password
    volumes:
      - kong_pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ['CMD', 'pg_isready', '-U', 'kong']
      interval: 10s
      timeout: 5s
      retries: 5

  kong-migration:
    image: kong:3.6
    command: kong migrations bootstrap
    depends_on:
      kong-database:
        condition: service_healthy
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password

  kong:
    image: kong:3.6
    depends_on:
      kong-migration:
        condition: service_completed_successfully
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: '0.0.0.0:8001'
      KONG_STATUS_LISTEN: '0.0.0.0:8100'
      # 성능 튜닝
      KONG_NGINX_WORKER_PROCESSES: auto
      KONG_UPSTREAM_KEEPALIVE_POOL_SIZE: 128
      KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS: 1000
    ports:
      - '8000:8000' # 프록시 (HTTP)
      - '8443:8443' # 프록시 (HTTPS)
      - '8001:8001' # Admin API
    healthcheck:
      test: ['CMD', 'kong', 'health']
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  kong_pgdata:

APISIX 기반 프로덕션 구현

APISIX Helm 기반 Kubernetes 배포

# APISIX Kubernetes 배포 (Helm)
helm repo add apisix https://charts.apiseven.com
helm repo update

# APISIX 설치 (etcd 포함)
helm install apisix apisix/apisix \
  --namespace apisix \
  --create-namespace \
  --set gateway.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set dashboard.enabled=true \
  --set etcd.replicaCount=3 \
  --set etcd.persistence.size=20Gi \
  --set apisix.nginx.workerProcesses=auto \
  --set apisix.nginx.workerConnections=65536

# APISIX 상태 확인
kubectl -n apisix get pods
kubectl -n apisix get svc

# Admin API를 통한 라우트 등록
curl -X PUT http://apisix-admin:9180/apisix/admin/routes/1 \
  -H "X-API-KEY: admin-api-key" \
  -d '{
    "uri": "/api/v1/products/*",
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "product-service.default.svc:8080": 1
      }
    },
    "plugins": {
      "jwt-auth": {},
      "limit-count": {
        "count": 200,
        "time_window": 60,
        "rejected_code": 429,
        "rejected_msg": "Rate limit exceeded. Please retry later.",
        "policy": "redis",
        "redis_host": "redis.default.svc",
        "redis_port": 6379,
        "key_type": "var",
        "key": "consumer_name"
      },
      "api-breaker": {
        "break_response_code": 503,
        "unhealthy": {
          "http_statuses": [500, 502, 503],
          "failures": 3
        },
        "healthy": {
          "http_statuses": [200],
          "successes": 2
        },
        "max_breaker_sec": 60
      }
    }
  }'

모니터링과 운영

Prometheus + Grafana 메트릭 수집

API Gateway의 핵심 모니터링 메트릭은 다음과 같다.

요청률 (Request Rate): 초당 처리 요청 수
에러율 (Error Rate): 4xx/5xx 응답 비율
레이턴시 (Latency): P50, P95, P99 응답 시간
Rate Limit 히트율: 제한에 도달한 요청 비율
서킷 브레이커 상태: Open/Closed/Half-Open 전환 이벤트
업스트림 헬스: 백엔드 서비스 가용성

# APISIX - Prometheus 메트릭 수집 설정
plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics
    export_addr:
      ip: '0.0.0.0'
      port: 9091
    # 커스텀 메트릭 레이블
    default_buckets:
      - 0.005
      - 0.01
      - 0.025
      - 0.05
      - 0.1
      - 0.25
      - 0.5
      - 1
      - 2.5
      - 5
      - 10

# 글로벌 플러그인으로 모든 라우트에 적용
global_rules:
  - id: 1
    plugins:
      prometheus:
        prefer_name: true
      # 분산 추적 (OpenTelemetry)
      opentelemetry:
        sampler:
          name: parent_based_traceidratio
          options:
            fraction: 0.1 # 10% 샘플링
        additional_attributes:
          - 'service.version'
        additional_header_prefix_attributes:
          - 'X-Custom-'

핵심 알림 규칙

# Prometheus Alert Rules
groups:
  - name: api-gateway-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(apisix_http_status{code=~"5.."}[5m]))
          / sum(rate(apisix_http_status[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: 'API Gateway 5xx 에러율 5% 초과'

      - alert: HighLatency
        expr: |
          histogram_quantile(0.99,
            sum(rate(apisix_http_latency_bucket[5m])) by (le, route)
          ) > 2000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'API Gateway P99 레이턴시 2초 초과'

      - alert: RateLimitExceeded
        expr: |
          sum(rate(apisix_http_status{code="429"}[5m])) > 100
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: 'Rate Limit 초과 요청 분당 100건 이상'

실패 사례와 대응

사례 1: Rate Limiter 설정 오류로 인한 서비스 장애

한 핀테크 기업에서 Rate Limiter를 local 정책으로 설정한 채 API Gateway를 3대로 스케일아웃했다. 각 노드가 독립적으로 Rate Limit을 적용하여 실제로는 설정값의 3배 트래픽이 백엔드로 전달되었고, 결제 서비스가 과부하로 다운되었다.

대응: 분산 환경에서는 반드시 redis 또는 cluster 정책을 사용해야 한다. Redis Cluster를 Rate Limit 저장소로 사용하면 노드 수에 관계없이 일관된 제한을 적용할 수 있다.

사례 2: API Gateway 단일 장애점 (Single Point of Failure)

API Gateway가 단일 인스턴스로 운영되던 중 메모리 누수로 인해 OOM(Out of Memory)이 발생하여 전체 서비스가 중단되었다.

대응: API Gateway는 반드시 HA(High Availability) 구성으로 운영해야 한다. 최소 2대 이상의 인스턴스를 Active-Active로 배포하고, L4 로드밸런서(AWS NLB, MetalLB)를 앞단에 배치한다. 헬스체크를 통해 장애 노드를 자동으로 제거한다.

사례 3: 인증 토큰 캐싱으로 인한 권한 에스컬레이션

JWT 토큰을 API Gateway에서 5분간 캐싱하도록 설정했는데, 사용자의 권한이 변경되거나 계정이 비활성화된 후에도 캐싱된 토큰으로 계속 접근이 가능했다.

대응: 토큰 캐시 TTL을 짧게 유지하고(30초~1분), 중요한 권한 변경 시 토큰 블랙리스트를 사용한다. Gateway에서 exp 클레임을 반드시 검증하고, 토큰 리보케이션 엔드포인트를 구현한다.

사례 4: 서킷 브레이커 미설정으로 인한 연쇄 장애

외부 결제 API의 응답 지연이 60초 이상으로 증가했지만, 서킷 브레이커가 설정되어 있지 않아 API Gateway의 모든 워커 프로세스가 결제 서비스 대기로 점유되었다. 그 결과 정상적인 다른 API도 응답할 수 없게 되었다.

대응: 모든 업스트림에 적절한 타임아웃과 서킷 브레이커를 설정한다. 연결 타임아웃은 3초, 읽기 타임아웃은 API 특성에 따라 5~~30초로 제한한다. 연속 3~~5회 실패 시 서킷을 오픈하고, 30~60초 후 Half-Open 상태로 전환하여 점진적으로 복구한다.

운영 체크리스트

프로덕션 환경에서 API Gateway를 운영할 때 확인해야 할 핵심 항목들이다.

배포 및 가용성

HA 구성 (최소 2대 이상, Active-Active)
L4 로드밸런서 앞단 배치 (AWS NLB, MetalLB 등)
롤링 업데이트 또는 블루-그린 배포 전략
구성 저장소 백업 (PostgreSQL, etcd)

보안

Admin API 접근 제한 (내부 네트워크만 허용)
TLS 1.3 적용 및 인증서 자동 갱신
JWT 토큰 검증 활성화 및 캐시 TTL 최소화
CORS, CSRF 보호 설정

Rate Limiting

분산 정책 사용 (redis 또는 cluster)
클라이언트 유형별 차등 제한 설정
Rate Limit 헤더 반환 (X-RateLimit-Limit, X-RateLimit-Remaining)
Redis 장애 시 fault_tolerant 설정

모니터링

Prometheus 메트릭 수집 활성화
P99 레이턴시, 에러율, Rate Limit 히트율 대시보드
서킷 브레이커 상태 변경 알림
분산 추적 (OpenTelemetry) 연동

성능

워커 프로세스 수 최적화 (CPU 코어 수 기준)
업스트림 Keepalive 연결 풀 설정
응답 캐싱 전략 적용
불필요한 플러그인 비활성화

참고자료

Complete Guide to API Gateway Pattern: Rate Limiting, Authentication, and BFF Architecture Design

Introduction
API Gateway Pattern Overview
Rate Limiting Algorithms
Authentication and Authorization Strategies
- JWT Authentication Setup
- OAuth2 + OIDC Integrated Authentication Flow
BFF (Backend for Frontend) Architecture
Load Balancing and Circuit Breakers
- Load Balancing Strategies
- Circuit Breaker Configuration
Production Deployment with Kong
- Docker Compose Kong Cluster
Production Deployment with APISIX
- APISIX Helm-based Kubernetes Deployment
Monitoring and Operations
- Prometheus + Grafana Metrics Collection
Failure Cases and Remediation
Operational Checklist
References

Introduction

As microservices architectures proliferate, it becomes impractical for clients to communicate directly with dozens or hundreds of backend services. An API Gateway serves as a single entry point between clients and backend services, centralizing cross-cutting concerns such as routing, authentication/authorization, rate limiting, load balancing, and protocol translation.

This article covers the core concepts of the API Gateway pattern, followed by an in-depth comparison of Rate Limiting algorithms (Token Bucket, Sliding Window, Fixed Window, Leaky Bucket), JWT/OAuth2-based authentication strategies, BFF (Backend for Frontend) architecture design, load balancing and circuit breaker configurations, and production implementation examples using Kong and Apache APISIX. We conclude with real-world failure scenarios and an operational checklist for production environments.

API Gateway Pattern Overview

Roles of an API Gateway

An API Gateway centralizes the following cross-cutting concerns:

Routing: Forwards requests to appropriate backend services based on URL, headers, and methods
Authentication/Authorization: JWT validation, OAuth2 token verification, API key management
Rate Limiting: Per-client and per-API request rate restrictions
Load Balancing: Traffic distribution using round-robin, weighted, or least-connections algorithms
Circuit Breaking: Automatically blocks requests to failing services to prevent cascading failures
Protocol Translation: REST to gRPC, HTTP to WebSocket conversions
Caching: Response caching for improved performance
Monitoring: Metrics collection, distributed tracing, and logging

API Gateway Solution Comparison

Feature	Kong	Apache APISIX	AWS API Gateway	Envoy
Core Technology	NGINX + Lua	NGINX + etcd	AWS Managed	C++
Performance (QPS)	~10,000+	~23,000+	Managed (with limits)	~15,000+
Plugin Ecosystem	Very rich (100+)	Rich (80+)	Limited	Rich (filter chains)
Config Store	PostgreSQL / Cassandra	etcd	AWS Internal	xDS API
Dynamic Config	Admin API	Admin API + etcd Watch	Console/CLI	xDS Hot Reload
Service Mesh	Kong Mesh (Kuma)	Amesh (Istio integration)	App Mesh	Istio default proxy
Kubernetes Native	Kong Ingress Controller	APISIX Ingress Controller	None (EKS integration)	Gateway API support
License	Apache 2.0 / Enterprise	Apache 2.0	Pay-per-use	Apache 2.0
Best For	General purpose, enterprise	High performance, dynamic routing	AWS native workloads	K8s, service mesh

API Gateway vs Service Mesh

API Gateways and service meshes are complementary technologies.

Aspect	API Gateway	Service Mesh
Position	Between clients and services (north-south traffic)	Between services (east-west traffic)
Primary Role	External request routing, auth, rate limiting	Inter-service mTLS, traffic management, observability
Deployment	Centralized (gateway cluster)	Distributed (sidecar proxies)
Protocols	HTTP, gRPC, WebSocket	TCP, HTTP, gRPC
Solutions	Kong, APISIX, AWS API GW	Istio, Linkerd, Consul Connect

Rate Limiting Algorithms

Rate limiting is one of the most critical API Gateway features, essential for preventing service overload, defending against DDoS attacks, and ensuring fair resource distribution.

Algorithm Comparison

Algorithm	Principle	Burst Allowed	Memory Usage	Accuracy	Complexity
Fixed Window	Counter within fixed time window	2x possible at boundary	Low	Low	Very Low
Sliding Window Log	Timestamp recorded per request	None	High	High	Medium
Sliding Window Counter	Weighted average of prev/current windows	Minimized	Low	Medium	Medium
Token Bucket	Tokens refilled at steady rate, consumed per request	Yes (up to bucket size)	Low	Medium	Low
Leaky Bucket	Requests processed at fixed rate, excess queued	None (fixed rate)	Low	High	Low

Token Bucket Algorithm

The Token Bucket algorithm is the most practical approach, allowing burst traffic while constraining the average request rate.

# Kong - Rate Limiting Plugin Configuration (Token Bucket based)
# kong.yml - Declarative Configuration
_format_version: '3.0'

services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: user-route
        paths:
          - /api/v1/users
    plugins:
      - name: rate-limiting
        config:
          # 100 requests per minute, 1000 per hour
          minute: 100
          hour: 1000
          # Policy: local (single node), cluster (cluster-wide), redis (Redis-based)
          policy: redis
          redis:
            host: redis-cluster
            port: 6379
            password: null
            database: 0
            timeout: 2000
          # Return rate limit headers
          header_name: null
          hide_client_headers: false
          # Limit key: consumer, credential, ip, header, path, service
          limit_by: consumer
          # Allow requests when Redis is down
          fault_tolerant: true

Sliding Window Algorithm

The Sliding Window Counter resolves the boundary problem of Fixed Windows while maintaining memory efficiency.

-- APISIX Custom Rate Limiting Plugin (Sliding Window Counter)
-- apisix/plugins/sliding-window-rate-limit.lua

local core = require("apisix.core")
local ngx = ngx
local math = math

local schema = {
    type = "object",
    properties = {
        rate = { type = "integer", minimum = 1 },
        burst = { type = "integer", minimum = 0 },
        window_size = { type = "integer", minimum = 1, default = 60 },
        key_type = {
            type = "string",
            enum = { "remote_addr", "consumer_name", "header" },
            default = "remote_addr"
        },
    },
    required = { "rate" },
}

local _M = {
    version = 0.1,
    priority = 1001,
    name = "sliding-window-rate-limit",
    schema = schema,
}

function _M.access(conf, ctx)
    local key = ctx.var.remote_addr
    if conf.key_type == "consumer_name" then
        key = ctx.consumer_name or ctx.var.remote_addr
    end

    local now = ngx.now()
    local window = conf.window_size
    local current_window = math.floor(now / window) * window
    local previous_window = current_window - window
    local elapsed = now - current_window

    -- Calculate weighted average of previous and current windows
    local prev_count = get_count(key, previous_window) or 0
    local curr_count = get_count(key, current_window) or 0
    local weight = (window - elapsed) / window
    local estimated = prev_count * weight + curr_count

    if estimated >= conf.rate then
        return 429, {
            error = "Rate limit exceeded",
            retry_after = math.ceil(window - elapsed)
        }
    end

    increment_count(key, current_window)
end

return _M

Authentication and Authorization Strategies

Centralizing authentication at the API Gateway significantly reduces the security burden on backend services.

JWT Authentication Setup

# APISIX - JWT Authentication Plugin Configuration
# apisix/conf/config.yaml
routes:
  - uri: /api/v1/orders/*
    upstream:
      type: roundrobin
      nodes:
        'order-service:8080': 1
    plugins:
      jwt-auth:
        # Public key for JWT signature verification
        key: 'user-auth-key'
        # Token location configuration
        header: 'Authorization'
        query: 'token'
        cookie: 'jwt_token'
      # Additional: Role-based access control
      consumer-restriction:
        type: consumer_group_id
        whitelist:
          - 'premium-users'
          - 'admin-group'
        rejected_code: 403
        rejected_msg: 'Access denied: insufficient permissions'

# Consumer configuration (API user definitions)
consumers:
  - username: 'mobile-app'
    plugins:
      jwt-auth:
        key: 'mobile-app-key'
        secret: 'mobile-app-secret-256bit-key-here'
        algorithm: 'HS256'
        exp: 86400 # Token expiry: 24 hours
        base64_secret: false
  - username: 'web-frontend'
    plugins:
      jwt-auth:
        key: 'web-frontend-key'
        # Public key path for RS256
        public_key: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
          -----END PUBLIC KEY-----
        algorithm: 'RS256'
        exp: 3600 # Token expiry: 1 hour

OAuth2 + OIDC Integrated Authentication Flow

Integrating OAuth2/OIDC at the API Gateway centralizes IdP (Identity Provider) connectivity.

# Kong - OpenID Connect Plugin Configuration
plugins:
  - name: openid-connect
    config:
      issuer: 'https://auth.example.com/realms/production'
      client_id: 'api-gateway'
      client_secret: 'gateway-secret-value'
      redirect_uri: 'https://api.example.com/callback'
      # Supported authentication flows
      auth_methods:
        - authorization_code # Web applications
        - client_credentials # Service-to-service
        - password # Legacy support (not recommended)
      # Token validation settings
      token_endpoint_auth_method: client_secret_post
      # Scope-based access control
      scopes_required:
        - openid
        - profile
        - api:read
      # Token caching (performance optimization)
      cache_ttl: 300
      # Token introspection (opaque token verification)
      introspection_endpoint: 'https://auth.example.com/realms/production/protocol/openid-connect/token/introspect'
      # Headers forwarded to upstream
      upstream_headers_claims:
        - sub
        - email
        - realm_access.roles
      upstream_headers_names:
        - X-User-ID
        - X-User-Email
        - X-User-Roles

BFF (Backend for Frontend) Architecture

Why BFF Pattern Is Needed

Serving all clients (web, mobile, IoT) through a single API Gateway introduces several problems:

Excessive data transfer: Full web-optimized payloads sent to mobile clients
Complex gateway logic: Client-specific branching logic accumulates in the gateway
Deployment coupling: Changes for one client type affect others

The BFF pattern provides dedicated, optimized backends for each frontend type, solving these problems.

BFF Routing Configuration

# APISIX - BFF Routing Configuration
# Route to dedicated BFFs based on client type
routes:
  # Web BFF - Rich data, detailed information
  - uri: /api/web/*
    name: web-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/web/(.*)'
          - '/$1'
      request-id:
        header_name: X-Request-ID
      jwt-auth: {}
      rate-limiting:
        rate: 200
        burst: 50
        key_type: consumer_name
    upstream:
      type: roundrobin
      nodes:
        'web-bff:3000': 1
      timeout:
        connect: 3
        send: 10
        read: 30

  # Mobile BFF - Lightweight data, pagination optimized
  - uri: /api/mobile/*
    name: mobile-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/mobile/(.*)'
          - '/$1'
      jwt-auth: {}
      rate-limiting:
        rate: 100
        burst: 20
        key_type: consumer_name
      # Mobile-specific: response size control
      response-rewrite:
        headers:
          set:
            X-Content-Optimized: 'mobile'
    upstream:
      type: roundrobin
      nodes:
        'mobile-bff:3001': 1
      timeout:
        connect: 3
        send: 5
        read: 15

  # IoT BFF - Minimal data, high frequency
  - uri: /api/iot/*
    name: iot-bff-route
    plugins:
      proxy-rewrite:
        regex_uri:
          - '^/api/iot/(.*)'
          - '/$1'
      key-auth: {} # IoT devices use API key authentication
      rate-limiting:
        rate: 500
        burst: 100
        key_type: var
        key: remote_addr
    upstream:
      type: roundrobin
      nodes:
        'iot-bff:3002': 1
      timeout:
        connect: 2
        send: 3
        read: 5

BFF Architecture Diagram

Client Layer             API Gateway          BFF Layer          Microservices
+----------+                              +----------+
| Web App  | ----+                   +--> | Web BFF  | --+--> User Service
+----------+     |    +-----------+  |    +----------+   +--> Product Service
                 +--> |           |--+                   +--> Order Service
+----------+     |    | API       |  |    +----------+
|Mobile App| ----+--> | Gateway   |--+--> |Mobile BFF| --+--> User Service
+----------+     |    |           |  |    +----------+   +--> Product Service
                 |    +-----------+  |
+----------+     |                   |    +----------+
|IoT Device| ----+                   +--> | IoT BFF  | --+--> Device Service
+----------+                              +----------+   +--> Telemetry Service

Load Balancing and Circuit Breakers

Load Balancing Strategies

API Gateways support various load balancing algorithms.

# APISIX - Load Balancing Strategies
upstreams:
  # Weighted Round Robin
  - id: 1
    type: roundrobin
    nodes:
      'service-a-v1:8080': 8 # 80% traffic
      'service-a-v2:8080': 2 # 20% traffic (canary deployment)
    # Health check configuration
    checks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5
          successes: 2
        unhealthy:
          interval: 3
          http_failures: 3
          tcp_failures: 3
      passive:
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 3
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          http_failures: 5
          tcp_failures: 2

  # Consistent Hashing (Session Affinity)
  - id: 2
    type: chash
    key: remote_addr
    nodes:
      'session-service-1:8080': 1
      'session-service-2:8080': 1
      'session-service-3:8080': 1

  # Least Connections
  - id: 3
    type: least_conn
    nodes:
      'compute-service-1:8080': 1
      'compute-service-2:8080': 1

Circuit Breaker Configuration

# APISIX - api-breaker Plugin (Automatic Circuit Breaker)
routes:
  - uri: /api/v1/payments/*
    plugins:
      api-breaker:
        # Circuit breaker trigger status codes
        break_response_code: 503
        break_response_body: '{"error":"circuit open","retry_after":30}'
        break_response_headers:
          - key: Content-Type
            value: application/json
          - key: Retry-After
            value: '30'
        # Unhealthy: circuit opens after 3 consecutive 500 errors
        unhealthy:
          http_statuses:
            - 500
            - 502
            - 503
          failures: 3
        # Healthy: circuit closes after 2 consecutive successes
        healthy:
          http_statuses:
            - 200
            - 201
          successes: 2
        # Maximum wait time after circuit opens (seconds)
        max_breaker_sec: 300
    upstream:
      type: roundrobin
      nodes:
        'payment-service:8080': 1

Production Deployment with Kong

Docker Compose Kong Cluster

# docker-compose.kong.yml
version: '3.8'

services:
  kong-database:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: kong
      POSTGRES_USER: kong
      POSTGRES_PASSWORD: kong_password
    volumes:
      - kong_pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ['CMD', 'pg_isready', '-U', 'kong']
      interval: 10s
      timeout: 5s
      retries: 5

  kong-migration:
    image: kong:3.6
    command: kong migrations bootstrap
    depends_on:
      kong-database:
        condition: service_healthy
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password

  kong:
    image: kong:3.6
    depends_on:
      kong-migration:
        condition: service_completed_successfully
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: kong_password
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: '0.0.0.0:8001'
      KONG_STATUS_LISTEN: '0.0.0.0:8100'
      # Performance tuning
      KONG_NGINX_WORKER_PROCESSES: auto
      KONG_UPSTREAM_KEEPALIVE_POOL_SIZE: 128
      KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS: 1000
    ports:
      - '8000:8000' # Proxy (HTTP)
      - '8443:8443' # Proxy (HTTPS)
      - '8001:8001' # Admin API
    healthcheck:
      test: ['CMD', 'kong', 'health']
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  kong_pgdata:

Production Deployment with APISIX

APISIX Helm-based Kubernetes Deployment

# APISIX Kubernetes Deployment (Helm)
helm repo add apisix https://charts.apiseven.com
helm repo update

# Install APISIX (with etcd)
helm install apisix apisix/apisix \
  --namespace apisix \
  --create-namespace \
  --set gateway.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set dashboard.enabled=true \
  --set etcd.replicaCount=3 \
  --set etcd.persistence.size=20Gi \
  --set apisix.nginx.workerProcesses=auto \
  --set apisix.nginx.workerConnections=65536

# Verify APISIX status
kubectl -n apisix get pods
kubectl -n apisix get svc

# Register route via Admin API
curl -X PUT http://apisix-admin:9180/apisix/admin/routes/1 \
  -H "X-API-KEY: admin-api-key" \
  -d '{
    "uri": "/api/v1/products/*",
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "product-service.default.svc:8080": 1
      }
    },
    "plugins": {
      "jwt-auth": {},
      "limit-count": {
        "count": 200,
        "time_window": 60,
        "rejected_code": 429,
        "rejected_msg": "Rate limit exceeded. Please retry later.",
        "policy": "redis",
        "redis_host": "redis.default.svc",
        "redis_port": 6379,
        "key_type": "var",
        "key": "consumer_name"
      },
      "api-breaker": {
        "break_response_code": 503,
        "unhealthy": {
          "http_statuses": [500, 502, 503],
          "failures": 3
        },
        "healthy": {
          "http_statuses": [200],
          "successes": 2
        },
        "max_breaker_sec": 60
      }
    }
  }'

Monitoring and Operations

Prometheus + Grafana Metrics Collection

Key API Gateway monitoring metrics include:

Request Rate: Requests processed per second
Error Rate: Percentage of 4xx/5xx responses
Latency: P50, P95, P99 response times
Rate Limit Hit Rate: Percentage of requests reaching limits
Circuit Breaker State: Open/Closed/Half-Open transition events
Upstream Health: Backend service availability

# APISIX - Prometheus Metrics Configuration
plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics
    export_addr:
      ip: '0.0.0.0'
      port: 9091
    default_buckets:
      - 0.005
      - 0.01
      - 0.025
      - 0.05
      - 0.1
      - 0.25
      - 0.5
      - 1
      - 2.5
      - 5
      - 10

# Global plugin applied to all routes
global_rules:
  - id: 1
    plugins:
      prometheus:
        prefer_name: true
      # Distributed tracing (OpenTelemetry)
      opentelemetry:
        sampler:
          name: parent_based_traceidratio
          options:
            fraction: 0.1 # 10% sampling
        additional_attributes:
          - 'service.version'

Failure Cases and Remediation

Case 1: Rate Limiter Misconfiguration Causing Outage

A fintech company configured their rate limiter with a local policy while scaling the API Gateway to 3 nodes. Each node independently applied rate limits, effectively allowing 3x the configured traffic to reach backends. The payment service went down due to overload.

Remediation: Always use redis or cluster policies in distributed environments. Redis Cluster as the rate limit store ensures consistent limits regardless of gateway node count.

Case 2: API Gateway Single Point of Failure

An API Gateway running as a single instance experienced OOM (Out of Memory) due to a memory leak, causing a complete service outage.

Remediation: Always deploy API Gateways in HA (High Availability) configuration. Deploy at least 2 instances in Active-Active mode with an L4 load balancer (AWS NLB, MetalLB) in front. Use health checks to automatically remove failing nodes.

Case 3: Token Caching Leading to Privilege Escalation

JWT tokens were cached for 5 minutes at the API Gateway. When a user's permissions were revoked or an account was deactivated, the cached token continued to grant access.

Remediation: Keep token cache TTL short (30 seconds to 1 minute). Use token blacklists for critical permission changes. Always validate the exp claim at the gateway and implement token revocation endpoints.

Case 4: Missing Circuit Breaker Causing Cascading Failures

An external payment API experienced response latency exceeding 60 seconds. Without circuit breakers, all API Gateway worker processes became occupied waiting for the payment service. As a result, even healthy APIs became unresponsive.

Remediation: Configure appropriate timeouts and circuit breakers for all upstreams. Set connection timeout to 3 seconds and read timeout to 5-30 seconds depending on API characteristics. Open the circuit after 3-5 consecutive failures and transition to Half-Open state after 30-60 seconds for gradual recovery.

Operational Checklist

Essential items to verify when operating an API Gateway in production.

Deployment and Availability

HA configuration (minimum 2 instances, Active-Active)
L4 load balancer in front (AWS NLB, MetalLB, etc.)
Rolling update or blue-green deployment strategy
Config store backup (PostgreSQL, etcd)

Security

Admin API access restricted to internal network only
TLS 1.3 with automatic certificate renewal
JWT token validation enabled with minimal cache TTL
CORS and CSRF protection configured