- Published on
Ingress Controller High Availability and Scaling
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- HA Topology: DaemonSet vs Deployment + HPA
- HA per Exposure Method
- Graceful Drain and Zero-Downtime Deploys
- Minimizing Reload Impact
- Leader Election
- Multi-AZ Spread
- Large-Scale Ingress Object Performance
- Resource Sizing and Load Testing
- Failure Scenarios and Responses
- HA in the Gateway API Era
- Conclusion
- References
Introduction
The Ingress controller is the single point of entry to your cluster. If the entry point dies, every service behind it — no matter how healthy — looks like a total outage from the outside. That is why controller high availability (HA) is not a "nice to have" but a precondition of operations.
HA is not merely "we bumped replicas to 2." Connections in flight must not be cut when the controller reloads, traffic must drain gracefully when a Pod terminates, and traffic must keep flowing even if an entire node or AZ disappears. On top of that, it must scale automatically as traffic grows, and the controller must hold up even as ingress objects swell into the thousands.
Using ingress-nginx as the reference, this article covers HA topology choices, availability per exposure method, graceful drain and zero-downtime deploys, minimizing reload impact, leader election, multi-AZ spread, performance in large-scale ingress environments, resource sizing, load-testing methods, and failure scenarios — all from an operations perspective. As of 2026 the Ingress API is frozen and Gateway API is the successor standard, so we close with how HA design carries over to Gateway API.
HA Topology: DaemonSet vs Deployment + HPA
How you place the controller is the first fork in the road. There are two mainstream patterns.
| Aspect | Deployment + HPA | DaemonSet |
|---|---|---|
| Placement | N replicas on arbitrary nodes | One per (or labeled) node |
| Scaling | HPA auto-adjusts replicas | Tied to node count |
| Fits exposure | cloud LoadBalancer Service | hostNetwork/hostPort |
| Suited for | Most cloud environments | Bare metal, edge, node=capacity |
| Resource efficiency | Elastic to load | Fixed cost per node |
The default choice in cloud environments is Deployment + HPA. A cloud LoadBalancer receives traffic and distributes it to nodes, while controller replicas scale up and down with load. On bare metal or edge, where the node itself is the entry point, DaemonSet + hostNetwork is common — the controller directly occupies 80/443 on each node, removing one hop.
A baseline Deployment-based HA configuration.
controller:
kind: Deployment
replicaCount: 3
minAvailable: 2
podDisruptionBudget:
enabled: true
minAvailable: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
Three things matter here. (1) A PodDisruptionBudget limits how many Pods can go down at once, guaranteeing minimum availability even during node maintenance. (2) podAntiAffinity spreads replicas across different nodes. (3) topologySpreadConstraints distributes them evenly across AZs as well.
HA per Exposure Method
HA characteristics differ depending on how the controller receives external traffic.
Method A: cloud LoadBalancer Service
[internet] → [cloud LB] → [NodePort] → [controller Pod]
- LB auto-excludes dead nodes via health checks
- externalTrafficPolicy: Local preserves client IP + removes an extra hop
Method B: hostNetwork (DaemonSet)
[internet] → [node IP:80/443] → [controller (host network)]
- removes one hop, lowest latency
- one controller per node, watch for port conflicts
Method C: NodePort + external LB (self-managed)
[internet] → [external LB] → [NodeIP:NodePort] → [controller]
- combine with MetalLB etc. on bare metal
externalTrafficPolicy in Method A is an important choice. Cluster (default) has all nodes receive traffic and redistribute internally, so distribution is even but the client IP is masked and an extra hop is added. Local sends only to Pods on that node, preserving the IP and removing a hop, but nodes without a Pod must be dropped from LB health checks, so distribution depends on the per-node Pod count. From an HA standpoint, with Local it is important to keep the per-node distribution of controller Pods balanced.
Graceful Drain and Zero-Downtime Deploys
When a controller Pod terminates, if in-flight requests are cut, users see 502s. The key to zero downtime is putting enough grace between "removal from endpoints" and "actual termination."
The termination sequence should flow like this.
1. Just before SIGTERM reaches the Pod, kube starts removing it from Endpoints
2. preStop hook: sleep to give the LB/kube-proxy time to propagate the removal
3. nginx begins graceful shutdown (reject new connections, finish existing ones)
4. In-flight requests complete within terminationGracePeriodSeconds
5. Process exits
A configuration example in ingress-nginx.
controller:
terminationGracePeriodSeconds: 300
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 30; /wait-shutdown
The preStop sleep is necessary because endpoint removal takes time to propagate to every LB and kube-proxy. Without that grace, terminating immediately leaves paths still sending traffic to this Pod, cutting connections. terminationGracePeriodSeconds must be generous enough for the longest legitimate request (for example a large upload or streaming) to finish.
The deploy strategy uses RollingUpdate with maxSurge to keep enough replicas alive at all times.
controller:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
maxUnavailable: 0 prevents bringing down existing Pods before new ones are Ready, avoiding a capacity dip during deploys.
Minimizing Reload Impact
A quirk of ingress-nginx is that it reloads the nginx configuration whenever an Ingress resource changes. Reloads are graceful by default (old workers finish their connections while new workers take new connections), but frequent reloads can cause memory-usage fluctuations, drops of long-lived connections (WebSocket, gRPC streams), and momentary latency spikes.
Ways to reduce reload shock.
# ConfigMap
data:
# dynamic endpoint changes are applied via Lua without a reload (enabled by default)
# increase worker shutdown grace to better preserve in-flight requests
worker-shutdown-timeout: "240s"
# keepalive tuning to ease the load right after a reload
upstream-keepalive-connections: "320"
upstream-keepalive-timeout: "60"
The key insight is that ingress-nginx handles endpoint (Pod IP) changes without a reload. Common events like Pods scaling or restarting are absorbed by Lua-based dynamic configuration and do not trigger a reload. What triggers a reload is mainly Ingress spec, annotation, or certificate changes. So to reduce reload frequency, it is effective to audit frequent Ingress changes (for example, automation that toggles an annotation on every deploy).
In operation, monitor reload frequency via metrics (the changes of config_last_reload_success_timestamp), and if you carry a lot of long-lived traffic like WebSocket/gRPC, set worker-shutdown-timeout generously to minimize drops on reload.
Leader Election
All replicas of ingress-nginx handle traffic, but some work — particularly updating the Ingress status.loadBalancer field — must be done by a single instance. Leader election handles this.
The leader is elected via a Lease object, and only the leader updates the status of Ingress objects. The data plane (actual traffic handling) is served equally by all replicas, so even if the leader dies, traffic is not interrupted — a new leader is simply elected. What operators should know: for leader election to work, RBAC must grant permissions on the Lease resource, and if election-related warning logs repeat, check permissions or the API server connection.
[replica A] ──┐
[replica B] ──┼─→ all handle the data plane (traffic)
[replica C] ──┘
│
└─ via Lease, only one is leader → responsible for updating Ingress status
if the leader dies → automatic re-election, no traffic impact
Multi-AZ Spread
If controllers are concentrated in a single AZ, an AZ failure takes everything down. The topologySpreadConstraints seen earlier are the key tool for AZ spread.
controller:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
A practical combination is strict spread at the zone level with DoNotSchedule, and best-effort spread at the hostname level with ScheduleAnyway so that scheduling is never blocked. Cloud LBs are usually multi-AZ aware and send traffic only to live AZs, so if the controller is evenly present in every AZ, service continues on the rest even when one AZ dies. The principle for replica count is to size so that "even if one whole AZ drops out, the remaining AZs can carry peak traffic" (N+1 or more).
Large-Scale Ingress Object Performance
The controller behaves differently with dozens of ingress objects than with thousands. As objects grow, the following become problems.
- The generated nginx config file becomes huge, lengthening reload time
- Controller memory/CPU usage rises
- API server watch load increases
- The cost of a single reload grows, so change frequency directly becomes a burden
Mitigation strategies follow. First, shard the controllers by namespace or domain group. By running multiple IngressClasses and controller instances, each watching only a subset of Ingresses, config size and reload cost are distributed.
# shard A controller
controller:
ingressClassResource:
name: nginx-shard-a
controllerValue: "k8s.io/ingress-nginx-shard-a"
watchNamespaces: "team-a,team-b"
Second, reduce unnecessary annotation changes to lower reload frequency. Third, monitor reload duration via metrics and increase sharding when it crosses a threshold. In large-scale environments, several small shards are often operationally more stable than a single huge controller.
Resource Sizing and Load Testing
Without proper requests/limits, HA crumbles. If the controller is throttled by hitting its CPU limit, you get a vicious cycle of health-check failure, restart, and traffic oscillation.
Sizing starts with measurement. Use load testing to derive "CPU cost per request" and "memory per concurrent connection," then add headroom on top of peak traffic.
# simple load generation (vegeta)
echo "GET https://app.example.com/" | \
vegeta attack -rate=2000 -duration=300s | \
vegeta report
# or stepwise load increase with k6
k6 run --vus 500 --duration 5m loadtest.js
Metrics to watch during load testing: controller CPU/memory, active connections, p99 latency, 5xx ratio, and recovery time when you deliberately kill a Pod (chaos). In particular, trigger reloads during load (for example, repeatedly modify an Ingress) and measure whether long-lived connections drop and how large the latency spikes get; only then do you understand the real production behavior.
Sizing guidelines vary by environment, but avoid setting requests so low that the controller cannot get resources when the node is packed, and leave enough room in limits to absorb the momentary memory increase during a reload.
Failure Scenarios and Responses
HA design is about drawing out "what can die" in advance.
| Failure | Impact | How HA design prevents it |
|---|---|---|
| One controller Pod crashes | Partial capacity loss | N replicas + PDB keep remaining capacity |
| One node down | Loss of that node's Pods | podAntiAffinity spreads across nodes |
| One AZ down | Loss of that entire AZ | topologySpread + N+1 sizing |
| Traffic surge | Saturation, rising latency | HPA auto-scale + headroom |
| Reload storm | long-lived drops, latency | worker-shutdown-timeout, change-frequency control |
| Ingress object explosion | Reload slowdown | Sharding |
| Certificate expiry | Total TLS failure | cert-manager auto-renewal + expiry alerts |
For each scenario, pre-define "detection (metrics/alerts), automatic mitigation (HPA/reschedule), and when manual intervention is needed," and a real incident flows without panic. AZ failure and traffic surges often come together (when one AZ drops, load concentrates on the rest), so verify that N+1 capacity and HPA work in concert.
HA in the Gateway API Era
As of 2026 the Ingress API is frozen and Gateway API is the successor standard. Most HA principles carry over, but a few things improve. Gateway API implementations (Envoy-based Contour/Istio, Cilium, NGINX Gateway Fabric, and others) separate the control plane and data plane more clearly, so config changes often do not directly translate into a data-plane reload. Envoy-based implementations push configuration dynamically via xDS, reducing the ingress-nginx-style "file reload" shock.
The three-tier model of Gateway API (GatewayClass/Gateway/HTTPRoute) also has clear separation of responsibilities, letting you naturally split the problem of too many Routes on one Gateway across multiple Gateways. This expresses the "sharding" seen earlier more elegantly at the API level. If you build PDBs, antiAffinity, topologySpread, and graceful drain properly on ingress-nginx today, these operational patterns remain almost entirely valid when you migrate to Gateway API.
Conclusion
The heart of Ingress controller HA is "not turning a single entry point into a single point of failure." Protect capacity with replica multiplicity and PDBs, spread across nodes and AZs with antiAffinity and topologySpread, never cut traffic during changes with graceful drain and zero-downtime deploys, adapt to load with HPA, and handle scale with sharding.
None of this is completed in one pass. Validate your assumptions with load testing and deliberate fault injection (chaos), observe real behavior via metrics, and refine gradually — that is the essence of HA operations. Only when the entry point does not waver can every service behind it run with confidence.
References
- Kubernetes Ingress concepts: https://kubernetes.io/docs/concepts/services-networking/ingress/
- ingress-nginx deploy/HA guide: https://kubernetes.github.io/ingress-nginx/deploy/
- ingress-nginx ConfigMap options: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/
- Kubernetes PodDisruptionBudget: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
- Kubernetes topologySpreadConstraints: https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
- Kubernetes HorizontalPodAutoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
- Gateway API: https://gateway-api.sigs.k8s.io/
- Contour (Envoy-based): https://projectcontour.io/docs/
- Traefik high availability documentation: https://doc.traefik.io/traefik/
- cert-manager: https://cert-manager.io/docs/