Skip to content
Published on

Ingress Rate Limiting and DDoS Mitigation: A Practical Guide from ingress-nginx to Gateway API

Authors

Introduction

One night, a login API that normally handled about 200 requests per second was suddenly flooded with 40,000 requests per second. The database connection pool was exhausted in an instant, and legitimate users were greeted with 504 Gateway Timeout errors. Tracing the traffic revealed that the same payload was being sent repeatedly to the same endpoint from thousands of IPs. It was a textbook L7 application-layer attack.

In situations like this, the first line of defense that comes to mind is application code. But the moment the application receives a request, it means a connection has already been established, the TLS handshake has completed, and a worker thread is occupied. In other words, the cost has already been incurred. Real defense must happen before requests reach expensive resources, as far out at the edge as possible. In a Kubernetes environment, that outer boundary is the Ingress layer.

This article walks through how to configure rate limiting at the Ingress layer, why accurate counting is hard in distributed environments, and how to separate the defense of L3/L4 volumetric attacks from L7 application attacks across different layers, all with hands-on code. As of 2026, the Ingress API is effectively frozen and the Gateway API has established itself as the successor standard, so we will also examine the relationship between the two APIs and the migration perspective.

Fundamentals of Rate Limiting

Why Limit at L7

Network defense plays a different role at each OSI layer. The table below summarizes what each layer can see and what it can block.

LayerVisible informationCan blockCannot block
L3 (IP)Source/destination IPIP-based blocking, geo blockingBotnets spoofing legitimate IPs
L4 (TCP/UDP)Ports, connection stateSYN floods, connection capsFine-grained per-path HTTP control
L7 (HTTP)Method, path, headers, cookiesPer-endpoint RPS, per-user quotasLarge volumetric floods

The key point is that each layer should block what it is best at blocking. Trying to stop a volumetric attack of hundreds of gigabits per second at the L7 Ingress means packets enter the cluster network and consume bandwidth before being blocked. Conversely, a fine-grained rule like five requests per second on a specific login endpoint cannot be expressed by L3/L4 appliances.

Token Bucket and Leaky Bucket

The two pillars of rate-limiting algorithms are the token bucket and the leaky bucket.

[Token Bucket]                       [Leaky Bucket]

 refill rate r ──> ( bucket b )      request ──> ( queue b ) ──> leak rate r ──> handle
              │                                  │
   each request consumes 1 token       if the queue is full, drop the request
   no token means reject               drains only at a fixed rate
  • Token bucket: maintains an average rate r while allowing instantaneous bursts up to the bucket capacity b. It naturally absorbs short spikes.
  • Leaky bucket: strictly flattens the output rate. It does not absorb bursts and lets traffic out only at a fixed rate.

ingress-nginx rate limiting internally uses nginx limit_req (a model close to a leaky bucket) and limit_conn. The burst parameter allows a small amount of bursting.

Deep Dive into ingress-nginx Annotations

ingress-nginx declares rate limiting through annotations on the Ingress resource. Here are the four most commonly used.

AnnotationMeaningUnit
limit-rpsAllowed requests per secondrequests per second
limit-rpmAllowed requests per minuterequests per minute
limit-connectionsConcurrent connection capconnections
limit-burst-multiplierBurst multiplier (default 5)multiplier

The following is an example Ingress protecting a login endpoint.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: auth-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "5"
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "2"
    nginx.ingress.kubernetes.io/limit-connections: "10"
    # Exclude trusted internal ranges from limiting
    nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,192.168.0.0/16"
spec:
  ingressClassName: nginx
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /auth/login
            pathType: Prefix
            backend:
              service:
                name: auth-service
                port:
                  number: 8080

Here, combining limit-rps: 5 with limit-burst-multiplier: 2 allows an actual burst capacity of up to 10 requests per second. In other words, it maintains an average of 5 RPS but lets through up to 10 requests in a momentary spike, rejecting anything beyond that with 503 Service Temporarily Unavailable.

What Is the Key Based On

The default rate-limiting key in ingress-nginx is the client IP. But if you sit behind a proxy or load balancer, you must accurately identify the real client IP. The crucial setting here is the trust configuration in the ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # Trusted proxy ranges (e.g., cloud LB CIDRs)
  proxy-real-ip-cidr: "130.211.0.0/22,35.191.0.0/16"
  use-forwarded-headers: "true"
  compute-full-forwarded-for: "true"

Enabling use-forwarded-headers makes nginx treat the IP in the X-Forwarded-For header as the client. However, this header can be forged by the client, so you must always specify the trusted ranges (proxy-real-ip-cidr) alongside it. Otherwise, an attacker could insert a fake X-Forwarded-For value on every request to bypass the counter.

Global Limits and Custom Responses

The default limits applied across all Ingresses are set in the ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  limit-rate: "0"                 # response bandwidth limit (bytes/sec), 0 is unlimited
  limit-req-status-code: "429"    # return 429 Too Many Requests instead of the default 503
  limit-conn-status-code: "429"

The default response code when rate limiting rejects a request is 503, but it is common to change it to 429 Too Many Requests so clients clearly recognize they should back off. Where possible, it is good to also return a Retry-After header.

Comparing Traefik / Kong / APISIX

While ingress-nginx is the most widely used, you might choose a different controller depending on your operating environment and requirements. From a rate-limiting perspective, the comparison is as follows.

Itemingress-nginxTraefikKongAPISIX
Configuration styleAnnotationsMiddleware CRDPlugin (KongPlugin)Plugin / Route
AlgorithmLeaky-bucket familySliding-window averageFixed/sliding windowToken/leaky bucket and more
Distributed countingNot by default (per-pod)Not by defaultRedis supportedRedis cluster supported
Key customizationLimitedSource-basedconsumer/IP/headerVariable-based, flexible
Reject response codeConfigurableConfigurable429 by defaultConfigurable

Traefik Middleware

Traefik declares rate limiting with a Middleware CRD.

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: api-ratelimit
  namespace: production
spec:
  rateLimit:
    average: 100      # average 100 requests per second
    burst: 50         # allow a momentary burst of 50
    period: 1s
    sourceCriterion:
      ipStrategy:
        depth: 1      # use the 1st IP from the end of X-Forwarded-For

Kong Plugin

Kong declares it with a KongPlugin resource, and specifying Redis as the backend enables distributed counting.

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: rate-limiting-redis
  namespace: production
plugin: rate-limiting
config:
  minute: 60
  policy: redis
  redis_host: redis.production.svc.cluster.local
  redis_port: 6379
  fault_tolerant: true

APISIX Route

APISIX provides the limit-req, limit-conn, and limit-count plugins, and supports global counting through a Redis cluster.

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: api-route
  namespace: production
spec:
  http:
    - name: limited
      match:
        hosts:
          - api.example.com
        paths:
          - /api/*
      backends:
        - serviceName: api-service
          servicePort: 8080
      plugins:
        - name: limit-count
          enable: true
          config:
            count: 200
            time_window: 60
            rejected_code: 429
            policy: redis
            redis_host: redis.production.svc.cluster.local
            redis_port: 6379

The Counting Problem in Distributed Environments

This is where teams get tripped up most often in practice. An Ingress controller usually runs as multiple pods (replicas). If each pod keeps its counter in its own memory, the overall limit becomes a multiple of the intended value.

                          [ client traffic ]
                ┌─────────────────┼─────────────────┐
                ▼                 ▼                 ▼
        ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
        │ ingress pod1 │  │ ingress pod2 │  │ ingress pod3 │
        │ memory counter│  │ memory counter│  │ memory counter│
        │  limit 100    │  │  limit 100    │  │  limit 100    │
        └──────────────┘  └──────────────┘  └──────────────┘

  intended limit: 100 RPS  →   actual allowed: up to 300 RPS (pods x 100)

The characteristics of the in-memory approach are summarized below.

ApproachAccuracyLatencyFault toleranceOperational complexity
In-memory (per-pod)Low (multiplied by pod count)FastestHighLow
Redis central counterHighOne extra network hopVulnerable to Redis failureHigh

The default ingress-nginx implementation is per-pod memory, so when you set a limit you should divide it by the replica count to approximate the intended global limit. For example, if you want a global 300 RPS with three pods, you set 100 RPS per pod. Be aware, though, that if HPA changes the pod count, the global limit shifts with it.

If you need accurate global counting, choose Kong or APISIX with a central store like Redis as the backend, or consider placing a separate rate-limiting gateway in front of ingress-nginx. When using Redis, the following settings matter.

- Atomic counting: eliminate race conditions with INCR + EXPIRE or a Lua script
- Fault-tolerant mode: decide whether to block (fail-closed) or pass (fail-open) on Redis failure
- Key expiry: choose sliding/fixed window so keys expire exactly per window
- Latency budget: measure the impact of Redis round-trips on p99 latency

Fail-open and fail-closed are a trade-off. When Redis dies, fail-open preserves availability but exposes you to attacks, while fail-closed is safe but blocks legitimate traffic too. We recommend fail-closed for sensitive endpoints like login and payment, and fail-open for general read APIs.

Whitelists and Geo Controls

You should not limit all traffic equally. Trusted partners, internal monitoring, and health checks should be excluded.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: partner-api
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "20"
    # Whitelisted ranges bypass rate limiting
    nginx.ingress.kubernetes.io/limit-whitelist: "203.0.113.0/24,198.51.100.10/32"
spec:
  ingressClassName: nginx
  rules:
    - host: partner.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: partner-service
                port:
                  number: 80

Geo blocking is implemented with a GeoIP module. ingress-nginx supports country-code-based control through nginx's GeoIP2 module.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  use-geoip2: "true"
  # Policies to allow or block specific countries are written with server-snippet

That said, GeoIP blocking is easily bypassed with VPNs and proxies, so it is realistic to use it as a supplementary signal rather than a standalone defense. It is well suited as a first filter for hostile traffic coming from regions with no legitimate users at all.

Separating L3/L4 DDoS from L7 Defense

The most common design mistake is trying to block all DDoS in one place at the Ingress. Volumetric attacks (L3/L4) and application attacks (L7) must be defended at fundamentally different locations.

[internet]
┌──────────────────────────────────────────────┐
│  cloud edge (CDN / Anycast / scrubbing)        │  <- L3/L4 volumetric defense
│  - absorb SYN floods, UDP amp, large packets    │     (AWS Shield, Cloud Armor, etc.)
└──────────────────────────────────────────────┘
   │  (only near-legitimate traffic passes)
┌──────────────────────────────────────────────┐
│  Ingress layer (ingress-nginx / Gateway)        │  <- L7 application defense
│  - per-endpoint RPS, connection caps, WAF rules │     (rate limiting, bot blocking)
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│  application (services/pods)                    │  <- business-logic quotas
│  - per-user quotas, idempotency, circuit breaker │
└──────────────────────────────────────────────┘

The division of responsibilities per layer is summarized in the table below.

Attack typeExampleDefense locationTools
Volumetric (L3/L4)SYN flood, UDP ampCloud edgeShield, Cloud Armor, scrubbing
Protocol (L4)Connection exhaustion, slowlorisEdge + IngressConnection caps, timeouts
Application (L7)HTTP flood, cache bustingIngress + appRate limiting, WAF

When you absorb volumetric attacks at the cloud edge, the traffic reaching the Ingress is already substantially cleaned up. At this stage the Ingress can focus solely on L7 rules. Conversely, if the Ingress takes hundreds of gigabits alone without edge protection, the node's NIC bandwidth and conntrack table collapse first.

Slow attacks like slowloris need separate attention. Since they occupy connections for a long time with little bandwidth, defend against them by setting timeouts short as follows.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  client-header-timeout: "10"
  client-body-timeout: "10"
  keep-alive-requests: "100"
  worker-shutdown-timeout: "30s"

Handling Bot Traffic

A large share of DDoS is generated by botnets. There are several signals that distinguish bots from legitimate users.

[bot identification signals]
  - Missing or abnormal User-Agent patterns
  - Stateless repeated requests that do not maintain cookies/sessions
  - Failure to pass JavaScript challenges
  - TLS fingerprint (JA3/JA4) matching known bot tools
  - Abnormally uniform request intervals (humans have jitter)

At the Ingress layer, simple User-Agent blocking or header validation is about as far as you can go. Below is an example that rejects an empty User-Agent with server-snippet.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: bot-protect
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
      if ($http_user_agent = "") {
        return 403;
      }
      if ($http_user_agent ~* "(curl|wget|python-requests|scrapy)") {
        return 403;
      }
spec:
  ingressClassName: nginx
  rules:
    - host: www.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-service
                port:
                  number: 80

However, since the User-Agent is trivially forged, its effect against sophisticated bots is limited. Advanced bot-management features like JavaScript challenges and JA3/JA4 fingerprinting are usually handled at the CDN/WAF layer (cloud edge), and it is reasonable to have the Ingress act as a first-pass filter.

Behavior Through Load Scenarios

When deciding on configuration values, you must validate with concrete scenarios. Below is the behavior of an API that normally runs at 200 RPS with a limit of 300 RPS (600 including burst) under three situations.

ScenarioIncoming trafficPassedRejected (429)User experience
Normal peak280 RPS2800Normal
Marketing spike550 RPS (10 sec)about 600 (flattened after burst absorption)SomeSlight delay
L7 flood40000 RPS30039700Legitimate users protected

The third scenario is the very reason rate limiting exists. Even when 40,000 RPS arrives, only the limit is forwarded to the backend and the rest is cut off at the Ingress with 429, so the database and application are protected. The cost of handling rejected requests is negligibly small compared with the cost of backend processing.

Always run a load test before going to production. Below are simple validation command examples.

# fire 200 concurrent, 20000 total requests with hey
hey -z 30s -c 200 https://api.example.com/api/items

# aggregate the status-code distribution to check the rejection rate
hey -z 30s -c 500 https://api.example.com/auth/login \
  | grep -A20 "Status code distribution"

Relationship with the Gateway API

As of 2026, the Ingress API is frozen with no new feature additions, and the successor standard is the Gateway API. The Gateway API is designed for role separation (infrastructure operator / cluster operator / app developer), protocol extensibility, and expressive routing.

PerspectiveIngressGateway API
Standard statusFrozenActively evolving
ResourcesSingle IngressGatewayClass / Gateway / HTTPRoute, etc.
Role separationWeakExplicit (RBAC-friendly)
Rate limitingImplementation annotationsPolicy attachment + implementation extensions
Traffic splittingAnnotationsWeight-based standard fields

That said, rate limiting is not fully part of the Gateway API standard spec and is often provided through implementation-specific policies (policy attachment) or extension filters. So during migration you must check how the implementation you use exposes rate limiting.

ingress-nginx has effectively entered maintenance mode as of 2025 and is increasingly operated with a focus on responding to security vulnerabilities. If you are designing a new cluster, it is safer in the long run to first consider Gateway API-based implementations (such as Envoy Gateway, Traefik's Gateway support, and Cilium Gateway). Below is a simple HTTPRoute example.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
  namespace: production
spec:
  parentRefs:
    - name: prod-gateway
  hostnames:
    - api.example.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080

Operations and Tuning

Establishing Observability

Rate limiting does not end with turning it on. You must continually observe how much is being rejected and whether the rejected traffic is an attack or legitimate users.

[observability metrics]
  - 429/503 response ratio (relative to total)
  - IP/path/User-Agent distribution of rejected requests
  - Changes in backend p50/p95/p99 latency
  - CPU/memory/connection count of Ingress pods
  - upstream 5xx ratio (to judge whether the limit is effective)

If you collect ingress-nginx metrics with Prometheus, you can track the 429 ratio via the status label of nginx_ingress_controller_requests. A sudden surge in the rejection rate indicates either an attack or limits that are too tight.

Phased Rollout

Applying strong rate limiting directly to production risks blocking legitimate users. We recommend the following sequence.

1. Observe mode: set very high limits and collect metrics only, with no rejections
2. Establish a baseline: analyze normal peak RPS, p99, and normal burst patterns
3. Conservative application: set the limit to 3-5 times the normal peak
4. Gradual tightening: adjust the limit while watching rejection rate and user impact
5. Endpoint differentiation: strict for login/payment, loose for static reads

Common Pitfalls and Troubleshooting

Here are pitfalls you will encounter repeatedly in practice.

[Pitfall 1] X-Forwarded-For not configured
  → all requests counted as a single LB IP, so everyone is blocked at once
  → always set proxy-real-ip-cidr and use-forwarded-headers together

[Pitfall 2] Mistaking per-pod counting for global
  → limit is 100 but with 3 pods up to 300 is actually allowed
  → divide by replica count, or introduce a central counter (Redis)

[Pitfall 3] Health checks/probes hit the rate limit
  → kubelet probes or monitoring receive 429 and pods die
  → add internal ranges to the whitelist

[Pitfall 4] Leaving the reject code as 503
  → clients mistake it for a server failure and retry endlessly → vicious cycle
  → induce backoff with 429 + Retry-After

[Pitfall 5] Trying to block volumetric attacks at L7
  → packets reach the node and saturate NIC/conntrack first
  → absorb volumetric attacks at the cloud edge

[Pitfall 6] Setting the limit too low
  → 429 occurs even at normal peak, causing user churn
  → measure the baseline first in observe mode

Debugging Commands

Commands you can use to quickly check when you suspect a problem.

# check limiting-related messages in the Ingress controller logs
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep -i limit

# inspect the actually applied nginx config (verify the generated limit_req zone)
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \
  cat /etc/nginx/nginx.conf | grep -A3 limit_req_zone

# quick check of the status-code distribution seen from a specific IP
for i in $(seq 1 20); do
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/auth/login
done | sort | uniq -c

Operational Checklist

The final pre-deployment review list.

[ ] Did you set differentiated limits per endpoint (strict for login, loose for reads)?
[ ] Did you accurately specify the X-Forwarded-For trust range (proxy-real-ip-cidr)?
[ ] Did you set the reject response to 429 + Retry-After?
[ ] Did you add health checks/monitoring/internal ranges to the whitelist?
[ ] Did you intentionally choose per-pod vs Redis counting?
[ ] Did you account for the effect of replica count changes (HPA) on the global limit?
[ ] Did you place L3/L4 volumetric defense at the cloud edge?
[ ] Did you set short timeouts to guard against slowloris?
[ ] Are you observing the 429/503 ratio and backend latency on a dashboard?
[ ] Did you pre-validate limit behavior with a load test?
[ ] Did you review the Gateway API migration path?

Conclusion

The core of rate limiting and DDoS mitigation is blocking the right attacks at the right location, layer by layer. Volumetric attacks must be stopped at the cloud edge, application-layer abuse at the Ingress, and business-rule violations at the application. Piling all responsibility on one layer is guaranteed to collapse.

The counting problem in distributed environments deceives people quietly in particular. Mistaking per-pod memory counting for a global limit lets traffic leak through to several times the configured value. If accuracy matters, introduce a central counter, but be sure to clearly define the fail-open/fail-closed policy for when Redis fails.

Finally, the direction of 2026 is clear. The Ingress API is frozen, ingress-nginx has entered maintenance mode, and the Gateway API is establishing itself as the successor standard. Refine your current ingress-nginx configuration well, but for new designs, we recommend actively evaluating Gateway API-based implementations and their rate-limiting models.

References