Skip to content

필사 모드: Ingress Troubleshooting Playbook — From 502/504/404 to TLS

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Introduction

Ingress is the first gateway for traffic entering a cluster. So when something breaks, Ingress is also where users hit it first. 502, 504, 404, TLS warnings — the root causes of these errors are scattered across layers (controller config, service selectors, backend health, certificates, DNS), and approaching them by guesswork only burns time.

This article is a playbook for eliminating guesswork. Starting from a symptom and following a diagnostic tree, you can narrow down the offending layer with just a few commands. We use ingress-nginx as the reference implementation, but the diagnostic procedures apply directly to other controllers like Traefik, HAProxy, and Contour.

As of 2026 the Ingress API is frozen and Gateway API is the successor standard, but most clusters in operation still run on top of Ingress. This playbook will therefore stay useful for some time. We close with a brief note on how to diagnose the same symptoms under Gateway API.

General Debugging Procedure

Before diving into per-symptom trees, there is an inspection order common to almost every Ingress problem. The key is to follow the traffic path, narrowing from the outside in.

[client] → [DNS] → [LoadBalancer/node] → [Ingress Controller] → [Service] → [Endpoints/Pod]

1 2 3 4 5 6

1. Does DNS resolve to the correct LB IP (dig/nslookup)?

2. Can you reach the LB/node port over TCP (curl -v, telnet)?

3. Has the controller recognized the Ingress resource (kubectl describe ingress)?

4. Is the controller routing to the intended backend (controller logs)?

5. Does the Service select the correct Pods (kubectl get endpoints)?

6. Does the Pod actually respond (curl directly via kubectl exec)?

A set of core commands.

Ingress status and events

kubectl describe ingress my-ingress -n my-namespace

Tail controller logs

kubectl logs -n ingress-nginx deploy/ingress-nginx-controller -f

Check the endpoints the service points to (one of the most important commands)

kubectl get endpoints my-service -n my-namespace

Request the backend directly from inside the controller Pod

kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \

curl -s -o /dev/null -w "%{http_code}\n" http://my-service.my-namespace.svc:80/

Test directly from outside by setting the Host header

curl -v -H "Host: app.example.com" http://<LB_IP>/

If `kubectl get endpoints` is empty (shows no addresses), that is the root cause of nearly all 502/503 cases. Starting here saves significant time.

Symptom 1: 404 Not Found

A 404 signals "the controller is alive but does not know where to send this request."

404 received

1) Is ingressClassName set and matching?

├─ missing/typo → controller ignores this Ingress. Add spec.ingressClassName

└─ matches → next

2) Do host and path match the request?

├─ Host header mismatch → check wildcard/exact host

├─ pathType issue → check Prefix vs Exact

└─ matches → next

3) Does the backend in describe ingress point to the right service/port?

├─ service name/port typo → fix it

└─ correct → it is the backend app's own 404 (routing is fine)

Inspection commands.

Which class this Ingress belongs to, and what its rules look like

kubectl get ingress my-ingress -o yaml | grep -A2 ingressClassName

kubectl describe ingress my-ingress

List installed IngressClasses

kubectl get ingressclass

Verify matching by specifying the Host header exactly

curl -v -H "Host: app.example.com" http://<LB_IP>/api/users

The most common cause is a missing ingressClassName. The controller only handles Ingresses for its own class, so if the class is absent or different, it ignores that Ingress entirely and returns the default 404. pathType also trips people up often. Prefix is segment-based prefix matching while Exact is a full match, so matching/missing can differ from your intent.

Symptom 2: 502 Bad Gateway

A 502 signals "the controller attempted to connect to the backend but did not receive a valid response."

502 received

1) Do endpoints exist?

├─ empty → Pod not ready / selector mismatch (effectively closer to 503)

└─ exist → next

2) Are the backend port/protocol correct?

├─ targetPort mismatch → check Service.targetPort

├─ HTTPS backend but HTTP set → backend-protocol annotation HTTPS

└─ correct → next

3) Does the backend respond properly? (curl directly from the controller)

├─ connection refused/reset → app crash / different listening port

├─ timeout → move to the 504 tree

└─ 200 → suspect header/body size, upstream keepalive

Inspection commands.

Call the backend directly (verify port/protocol)

kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \

curl -sv http://my-service.my-namespace.svc:8080/healthz

Find 502 lines and upstream info in controller logs

kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep " 502 "

The classic 502 cause is a port/protocol mismatch — the Service's targetPort differs from the port the container actually listens on, or the backend is HTTPS but the controller connects over HTTP. The latter is fixed with the backend-protocol annotation.

metadata:

annotations:

nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"

Excessively large response headers (for example a huge Set-Cookie) or a mismatched upstream keepalive setting can also produce 502. In that case you may need to tune proxy-buffer-size.

Symptom 3: 503 Service Unavailable (no endpoints)

The classic cause of 503 is "no backend to send to."

503 received

kubectl get endpoints my-service

├─ <none> → 1) Is the Pod Ready? 2) Does the Service selector match Pod labels?

│ 3) Is readinessProbe failing, dropping it from traffic?

└─ has addrs → transient controller-to-service sync lag, or mid-rollout maxUnavailable

Inspection commands.

Endpoints (most important)

kubectl get endpoints my-service -n my-namespace

Pod status and readiness

kubectl get pods -n my-namespace -l app=my-app

kubectl describe pod <pod> -n my-namespace | grep -A5 Conditions

Compare service selector with Pod labels

kubectl get service my-service -o jsonpath='{.spec.selector}'

kubectl get pods --show-labels -n my-namespace

If endpoints shows no addresses, the cause is one of three: (1) the Pod is not Ready yet, (2) the Service selector does not match Pod labels, or (3) the readinessProbe is failing and the Pod has been excluded from endpoints. Right after a deploy this is transient, but if it persists, suspect a selector typo or probe misconfiguration.

Symptom 4: 504 Gateway Timeout

A 504 signals "the backend did not respond within the time limit."

504 received

1) Is the backend itself slow? (measure time via direct curl)

├─ slow → app performance issue. Suspect DB / external calls

└─ fast → next

2) Is the timeout annotation too short?

└─ consider raising proxy-read-timeout/proxy-send-timeout

3) Is it keepalive / connection-pool exhaustion?

└─ check upstream-keepalive settings, concurrency

Inspection commands.

Measure backend response time

kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- \

curl -s -o /dev/null -w "time_total=%{time_total}\n" http://my-service.my-namespace.svc:80/slow

Check the current timeout annotations

kubectl get ingress my-ingress -o yaml | grep timeout

Blindly raising the timeout only masks the symptom. First look at why the backend is slow (DB queries, external APIs, GC). If the work genuinely takes a long time (report generation, for example), raise the timeout explicitly.

metadata:

annotations:

nginx.ingress.kubernetes.io/proxy-read-timeout: "120"

nginx.ingress.kubernetes.io/proxy-send-timeout: "120"

Symptom 5: TLS Errors (Certificate/SNI)

TLS problems show up as browser warnings, handshake failures, or the wrong certificate being served.

TLS error

1) Which certificate is served? (openssl s_client)

├─ default/fake cert → Ingress tls.secretName or SNI matching issue

├─ expired cert → check cert-manager renewal failure

└─ correct cert → client trust-chain issue

2) Does the Secret exist with the right format?

└─ check kubernetes.io/tls type, tls.crt/tls.key keys

3) Do the host and certificate SAN match?

└─ check SNI / wildcard scope

Inspection commands.

Check the actually served cert by specifying SNI

openssl s_client -connect <LB_IP>:443 -servername app.example.com 2>/dev/null \

| openssl x509 -noout -subject -issuer -dates

TLS Secret existence/format

kubectl get secret my-tls -n my-namespace -o yaml | grep -E "tls.crt|tls.key|type"

cert-manager Certificate status

kubectl describe certificate my-cert -n my-namespace

The most common pitfall is SNI-based certificate selection. When multiple hosts share one controller, if the client sends no SNI or there is no tls entry matching the host, the controller serves the default (fake) certificate. If you use cert-manager, first check the Certificate resource's Ready condition and renewal events. Expiry is almost always the result of a renewal failure.

Symptom 6: Redirect Loops

When a browser shows "too many redirects," you have an infinite loop.

Redirect loop

1) Ingress forces an HTTPS redirect, but the backend also redirects to HTTP?

└─ double redirect. Handle it in only one place

2) TLS terminates at the LB but the backend ignores X-Forwarded-Proto?

└─ enable use-forwarded-headers, make the app trust the header

3) force-ssl-redirect conflicts with the app's own redirect?

└─ disable one side

The classic scenario: TLS terminates at the external LB, the controller/backend receives plain HTTP, and the app repeatedly does "this is HTTP so redirect to HTTPS." The fix is to configure trust of the X-Forwarded-Proto header.

ConfigMap (controller-wide)

data:

use-forwarded-headers: "true"

compute-full-forwarded-for: "true"

Symptom 7: 413 Request Entity Too Large (Large Uploads)

If file uploads fail at a certain size, that is a 413. The controller's body-size limit is the cause.

Inspection and fix.

Find 413 / "client intended to send too large body" in controller logs

kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep "too large"

Raise the limit with an annotation on the relevant Ingress.

metadata:

annotations:

nginx.ingress.kubernetes.io/proxy-body-size: "100m"

To change it globally, adjust proxy-body-size in the ConfigMap. Note that unlimited (0) is risky for memory/disk buffering, so a reasonable value matching the real upload ceiling is recommended. Also review large uploads together with the proxy-request-buffering setting to avoid memory blowups.

Catalog of Common Misconfigurations

Separate from the diagnostic trees, here is a collection of misconfigurations that repeatedly bite people.

| Mistake | Symptom | Fix |

| --- | --- | --- |

| Missing ingressClassName | 404, controller ignores | Set spec.ingressClassName |

| Service targetPort typo | 502 | Match the container's listening port |

| Selector/Pod label mismatch | 503 no endpoints | Fix the selector |

| backend-protocol unset (HTTPS backend) | 502 | backend-protocol HTTPS annotation |

| pathType confusion (Prefix vs Exact) | 404 or over-matching | Set per intent |

| rewrite-target/path capture group mismatch | Wrong path forwarded | Align regex capture and rewrite |

| use-forwarded-headers unset | Redirect loop, wrong client IP | Enable in ConfigMap |

| proxy-body-size default | 413 upload failure | Raise via annotation |

| TLS Secret wrong type | Cert not applied | Verify kubernetes.io/tls type |

rewrite-target in particular will silently forward the wrong path if capture-group numbers are off. This kind of issue manifests not as a 404 or 500 but as "the wrong page shows up," so it tends to be discovered late.

Pre-flight Checklist

A checklist to run before deploying, or as a wrap-up to troubleshooting.

- spec.ingressClassName is set and matches an installed class

- host and pathType match intent

- The Service's targetPort matches the container port

- kubectl get endpoints shows populated addresses

- The TLS Secret is of type kubernetes.io/tls and host matches the SAN

- The cert-manager Certificate is Ready

- proxy-body-size covers the real upload ceiling

- Timeout annotations cover the backend's worst-case response time

- use-forwarded-headers is configured for the LB topology

- Controller logs show no reload failures or config warnings

The Same Symptoms Under Gateway API

As of 2026 Gateway API is the successor standard. The symptoms are similar, but the diagnostic targets change. For 404, look at the HTTPRoute's matching rules, the Gateway's listener/hostname, and the ReferenceGrant (allowing cross-namespace references). For 503, check the endpoints of the service referenced by backendRef; for TLS, check the Gateway listener's certificateRefs.

The good part is that Gateway API exposes far richer status conditions. Both HTTPRoute and Gateway show conditions like Accepted/Programmed/ResolvedRefs and their reasons via `kubectl describe`, so "why is routing failing" surfaces as explicit status rather than annotation guesswork. Carry over this playbook's symptom-to-layer narrowing mindset; just shift the inspection target from Ingress resources to the status conditions of Gateway/HTTPRoute.

Conclusion

The heart of Ingress troubleshooting is breaking the guessing. Start from the symptom, narrow the traffic path from outside in, and verify the facts with one or two commands at each layer; most problems reveal their offending layer within minutes. In particular, making just two habits — `kubectl get endpoints` and a direct curl from the controller — resolves more than half of 502/503 cases immediately.

Paste this playbook into your team wiki or runbook, and update the trees and catalog whenever you meet a new symptom. Over time this document becomes the team's collective memory, ensuring you never fall into the same trap twice.

References

- Kubernetes Ingress concepts: https://kubernetes.io/docs/concepts/services-networking/ingress/

- ingress-nginx troubleshooting guide: https://kubernetes.github.io/ingress-nginx/troubleshooting/

- ingress-nginx annotations reference: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/

- Kubernetes Service/Endpoints debugging: https://kubernetes.io/docs/tasks/debug/debug-application/debug-service/

- cert-manager troubleshooting: https://cert-manager.io/docs/troubleshooting/

- Gateway API: https://gateway-api.sigs.k8s.io/

- Traefik routing documentation: https://doc.traefik.io/traefik/routing/overview/

- HAProxy Kubernetes Ingress documentation: https://www.haproxy.com/documentation/kubernetes-ingress/

- Contour troubleshooting: https://projectcontour.io/docs/

- Kong Ingress Controller documentation: https://docs.konghq.com/kubernetes-ingress-controller/

현재 단락 (1/197)

Ingress is the first gateway for traffic entering a cluster. So when something breaks, Ingress is al...

작성 글자: 0원문 글자: 12,700작성 단락: 0/197