- Published on
OIDC Token Validation at the API Gateway — Istio, Envoy, and Gateway API in Practice
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction — Where Should Token Validation Happen
- Validating at the Edge vs in the Service
- The Envoy jwt_authn Filter in Detail
- Istio — RequestAuthentication + AuthorizationPolicy
- JWKS Caching and Failure Modes
- Audience Strategy
- Token Propagation — Relaying the Original Token vs Token Exchange
- Kong vs APISIX — Comparing the OIDC Plugins
- Auth Standardization in the Gateway API Era
- Combining mTLS and JWT — Service Identity and User Identity
- The Performance Perspective
- Troubleshooting — A 401 Debugging Flowchart
- Closing Thoughts
- References
Introduction — Where Should Token Validation Happen
One of the most recurring security debates in microservice architecture is "where do we validate JWTs?" Only once at the gateway? In every service? Both? As of 2026 the industry consensus is fairly clear: filter at the edge, validate again in the service (defense in depth). And the de facto standard implementation vehicle is the Envoy-based stack (Istio, Envoy Gateway, Gloo, and friends).
In this post we understand Envoy's jwt_authn filter from the ground up, then explore the Istio RequestAuthentication + AuthorizationPolicy combination with plenty of YAML. We tackle operational hard problems — JWKS caching and failure modes, audience strategy, token propagation patterns including RFC 8693 Token Exchange — then compare the OIDC plugins of Kong and APISIX, survey auth standardization in the Gateway API era, and combine mTLS with JWT. We close with an ASCII flowchart for debugging 401s.
Validating at the Edge vs in the Service
Let us start with the trade-offs.
| Aspect | Edge (gateway) only | Each service only |
|---|---|---|
| Performance | One validation, free inside | Validation cost repeated per hop |
| Consistency | Central policy, one place to config | Per-service library/config fragmentation |
| Internal compromise | Defenseless if gateway is bypassed | Internal traffic also validated — robust |
| Claim usage | Must forward via headers (forgery risk) | Services access claims directly |
| Operational burden | Low | Library versions, JWKS handling scattered |
The conclusion is to combine both. The recommended practical pattern:
- Edge (gateway): validate the signature, issuer (iss), expiry (exp), and audience (aud), rejecting bad traffic early. Expensive internal resources are not wasted on garbage tokens.
- Service (sidecar or library): repeat the same validation, then add per-service audience checks and fine-grained authorization (scopes, roles). Even if the gateway is breached or a forged internal call arrives, you are defended.
- Service-to-service trust via mTLS: independent of the user token, the calling service proves its identity via mTLS (SPIFFE and friends). More on this below.
[Edge: first-pass validation - signature/iss/exp/aud]
Client ──> API Gateway (Envoy jwt_authn) ──┐
│ mTLS (service identity)
v
[Service: second validation + fine-grained authz]
Service A (sidecar RequestAuthentication)
│ token relay or exchange
v
Service B (sidecar + AuthorizationPolicy)
The Envoy jwt_authn Filter in Detail
Whether you run Istio, Envoy Gateway, or certain Kong modes, the thing actually validating JWTs at the bottom is Envoy's HTTP filter jwt_authn. Understand it and you understand the behavior and failure modes of every higher-level abstraction.
There are two core concepts.
- providers — the definition of "whose tokens, with which keys, validated how." You specify issuer, audiences, the JWKS source, where to extract the token, and how to pass the payload along.
- rules — the mapping of "which routes require which provider." You point at a provider via requires, with relaxation modes like allow_missing and allow_missing_or_failed.
A complete configuration example:
http_filters:
- name: envoy.filters.http.jwt_authn
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
keycloak_provider:
issuer: https://keycloak.example.com/realms/prod
audiences:
- orders-api
remote_jwks:
http_uri:
uri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
cluster: keycloak_jwks_cluster
timeout: 3s
cache_duration: 600s
async_fetch:
fast_listener: false
retry_policy:
num_retries: 3
# Extract from the Authorization: Bearer header (default)
from_headers:
- name: Authorization
value_prefix: 'Bearer '
# Store the verified payload in metadata for later filters (RBAC etc.)
payload_in_metadata: jwt_payload
# Forward selected claims upstream as plain headers
claim_to_headers:
- header_name: x-jwt-sub
claim_name: sub
- header_name: x-jwt-scope
claim_name: scope
# Keep or strip the original token toward the upstream
forward: true
# Allowed clock skew for exp validation
clock_skew_seconds: 30
rules:
# Health checks need no token
- match:
prefix: /healthz
# Public docs: validate if a token is present, pass if absent
- match:
prefix: /docs
requires:
requires_any:
requirements:
- provider_name: keycloak_provider
- allow_missing: {}
# Everything else is mandatory
- match:
prefix: /
requires:
provider_name: keycloak_provider
Per-setting caveats:
- issuer must match the token's iss claim exactly, string for string. A single trailing slash difference produces 401s.
- The cluster referenced by remote_jwks must be defined separately. Envoy abstracts even the JWKS endpoint as a cluster; missing DNS/TLS configuration means keys cannot be fetched and every request becomes a 401.
- Enabling async_fetch pre-fetches the JWKS at listener startup and refreshes it in the background, reducing first-request latency and the blast radius of brief JWKS endpoint outages.
- Without forward: true, the Authorization header may be stripped toward the upstream by default (behavior varies by configuration lineage). Be explicit if you need token propagation.
- claim_to_headers produces plain headers. Before trusting them, the upstream must be guaranteed — at the network level — that traffic cannot reach it without passing through Envoy.
Istio — RequestAuthentication + AuthorizationPolicy
Istio abstracts jwt_authn behind the RequestAuthentication CRD and authorization behind the AuthorizationPolicy CRD. One crucial fact first:
RequestAuthentication alone blocks nothing. It only means "if a token is present, validate it" — requests with no token at all pass straight through. To actually block, you must declare via AuthorizationPolicy that only requests with a valid subject (requestPrincipals) are allowed. Real-world security incidents caused by this trap are common.
First-pass validation at the ingress gateway:
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: ingress-jwt
namespace: istio-system
spec:
selector:
matchLabels:
istio: ingressgateway
jwtRules:
- issuer: https://keycloak.example.com/realms/prod
jwksUri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
audiences:
- api-gateway
forwardOriginalToken: true
outputClaimToHeaders:
- header: x-jwt-sub
claim: sub
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: ingress-require-jwt
namespace: istio-system
spec:
selector:
matchLabels:
istio: ingressgateway
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ['*']
to:
- operation:
notPaths: ['/healthz', '/metrics']
The DENY + notRequestPrincipals pattern is the standard idiom for "reject any request without a valid token subject." The value of requestPrincipals takes the form of iss and sub joined by a slash.
At the service layer we apply finer authorization. Expressing "write operations on the orders service require the orders:write scope":
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: orders-jwt
namespace: orders
spec:
selector:
matchLabels:
app: orders
jwtRules:
- issuer: https://keycloak.example.com/realms/prod
jwksUri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
audiences:
- orders-api
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: orders-authz
namespace: orders
spec:
selector:
matchLabels:
app: orders
action: ALLOW
rules:
# Reads: any authenticated subject
- from:
- source:
requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
to:
- operation:
methods: ['GET']
paths: ['/orders', '/orders/*']
# Writes: require the orders:write scope
- from:
- source:
requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
to:
- operation:
methods: ['POST', 'PUT', 'DELETE']
paths: ['/orders', '/orders/*']
when:
- key: request.auth.claims[scope]
values: ['*orders:write*']
Also remember: the moment any ALLOW policy exists on a workload, the behavior flips to "deny everything that does not match a rule." Adding a policy switches the default to deny-by-default.
JWKS Caching and Failure Modes
The availability of a JWT validation system is decided by how the JWKS endpoint is managed. Failure scenarios, in a table:
| Scenario | Symptom | Mitigation |
|---|---|---|
| Brief JWKS endpoint outage | Fetch fails after cache expiry → mass 401 | async_fetch + generous cache TTL, IdP redundancy |
| Right after key rotation | Tokens with new kid hit cache miss → 401 | IdP publishes new key, waits a grace period before signing |
| IdP deletes the old key immediately | All existing tokens 401 | Retire keys no sooner than the max token lifetime |
| Gateway restart while IdP is down | Initial JWKS fetch fails | Decide fail-open policy; fall back to a local JWKS file |
| Clock skew | Intermittent exp/nbf validation failures | Set clock_skew_seconds + monitor NTP |
Operational recommendations:
- Design the cache TTL together with the IdP key rotation schedule. Example: publish the new key 24 hours before rotation, cache for 10 minutes — even stale caches never break validation.
- Keep Envoy's local_jwks (file-based) as an emergency fallback so validation continues with existing keys during a total IdP outage. The trade-off: revoked keys may live longer.
- Expose JWKS fetch failure rate, cache hit rate, and 401 ratio as metrics and alert on them. Envoy provides jwt_authn statistics (denied, jwks_fetch_failed, and more).
Audience Strategy
The aud claim declares "who this token is for." There are three strategic options.
- Single audience (one for the gateway) — simple to implement, but the token is reusable against any service, so the blast radius of token theft is large.
- Per-service audiences — each service accepts only its own audience. Safest, but clients need different tokens per service, and service-to-service calls require token exchange.
- Tiered (the realistic compromise) — split audiences per externally exposed API, and handle internal granularity with scopes. The gateway validates a broad audience; each service validates the audience of its API plus scopes.
Recommended principles:
- At minimum, avoid the "any token from this org passes" state of unvalidated audiences. RFC 9700 (OAuth Security BCP) names audience restriction as a key mitigation.
- Scope access token audiences to resource servers; never use ID tokens (aud = the client) for API calls. Sending ID tokens to APIs is a common anti-pattern.
Token Propagation — Relaying the Original Token vs Token Exchange
When service A receives a user request and calls service B, how do you carry the user context across?
Pattern 1: relay the original token (token relay)
Client --(JWT aud=api)--> Gateway --(same JWT)--> Service A --(same JWT)--> Service B
- Pros: simple, no extra IdP round trips.
- Cons: the audience must be broad, so token theft endangers every service. For the token's lifetime, B can impersonate A against other services. No delegation information, so audit trails are incomplete.
Pattern 2: Token Exchange (RFC 8693)
Service A presents the received token to the IdP and exchanges it for a new token whose audience is narrowed to B.
curl -s -X POST https://keycloak.example.com/realms/prod/protocol/openid-connect/token \
-d grant_type=urn:ietf:params:oauth:grant-type:token-exchange \
-d client_id=service-a \
-d client_secret=SERVICE_A_SECRET \
-d subject_token=ORIGINAL_USER_ACCESS_TOKEN \
-d subject_token_type=urn:ietf:params:oauth:token-type:access_token \
-d audience=service-b
The resulting token carries sub (the original user) plus an act (actor) claim recording the delegation chain — "service-a acting on behalf of the user."
{
"iss": "https://keycloak.example.com/realms/prod",
"sub": "user-1234",
"aud": "service-b",
"scope": "orders:read",
"act": {
"sub": "service-account-service-a"
},
"exp": 1781234567
}
- Pros: minimal audiences, preserved delegation chains, the ability to scope down. The same mechanism powers delegation tracking in the AI agent era.
- Cons: an IdP round trip per hop (caching is mandatory), and the burden of managing exchange policy on the IdP.
The practical compromise: "exchange only where a trust boundary is crossed (between domains, entering sensitive services); within the same trust boundary, relay + mTLS." Note that the transaction tokens work standardizing this flow is in progress at the OAuth WG; we revisit it together with workload identity in the next post (SPIFFE/SPIRE).
Kong vs APISIX — Comparing the OIDC Plugins
Comparing the OIDC handling of the two most-used gateways outside the Envoy family.
| Aspect | Kong (openid-connect plugin) | APISIX (openid-connect / authz-keycloak) |
|---|---|---|
| Foundation | nginx/OpenResty + lua-resty-openidc | nginx/OpenResty + lua-resty-openidc |
| Licensing | OIDC plugin is Enterprise | Included in OSS |
| Modes | Validation (JWT), session (cookie), relying party | Validation, relying party, Keycloak authz |
| JWKS caching | Built-in, discovery cache | Built-in, discovery cache |
| Fine-grained authz | Combine ACL/scope plugins | Delegate UMA permission checks via authz-keycloak |
| Declarative mgmt | decK, Kong CRDs | APISIX CRDs, ADC |
An APISIX configuration example:
apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
name: orders-route
namespace: apps
spec:
http:
- name: orders
match:
hosts:
- api.example.com
paths:
- /orders/*
backends:
- serviceName: orders
servicePort: 8080
plugins:
- name: openid-connect
enable: true
config:
discovery: https://keycloak.example.com/realms/prod/.well-known/openid-configuration
client_id: apisix-gateway
client_secret: GATEWAY_CLIENT_SECRET
bearer_only: true
use_jwks: true
token_signing_alg_values_expected: RS256
audience: orders-api
bearer_only: true is the API-gateway mode that "validates only Bearer tokens, no browser redirects." To have the gateway also handle web app sessions, turn bearer_only off and use relying-party mode (at which point the gateway starts resembling an IAP).
Auth Standardization in the Gateway API Era
The Kubernetes Gateway API — the successor to Ingress — standardized routing, but authentication and authorization long remained the territory of implementation-specific extensions (policy CRDs). The 2025-2026 trajectory:
- With Gateway API 1.4, the Policy Attachment pattern (BackendTLSPolicy and friends) became established, and standardization of auth filters (HTTPRoute-level JWT/extAuth filters) is progressing as GEPs (Gateway Enhancement Proposals).
- Until then, practice means per-implementation policy CRDs. Envoy Gateway's SecurityPolicy is the canonical example.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: orders-jwt
namespace: apps
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: orders-route
jwt:
providers:
- name: keycloak
issuer: https://keycloak.example.com/realms/prod
audiences:
- orders-api
remoteJWKS:
uri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
claimToHeaders:
- claim: sub
header: x-jwt-sub
Even on the same Envoy base, Istio uses its own CRDs, Envoy Gateway uses SecurityPolicy, and Gloo uses yet another CRD — fragmentation is today's reality. But since they all sit on the jwt_authn filter underneath, the first half of this post applies to every implementation. Long term, attaching auth filters to HTTPRoute with standard syntax is the likely direction.
Combining mTLS and JWT — Service Identity and User Identity
mTLS and JWT are not competitors; they are orthogonal mechanisms answering different questions.
- mTLS (peer identity) — "which workload sent this request?" In Istio it is enforced with PeerAuthentication, and the identity is expressed as a SPIFFE-format principal.
- JWT (request identity) — "which end user does this request represent?"
An AuthorizationPolicy combining both is the standard form of Zero Trust microservices.
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: orders
spec:
mtls:
mode: STRICT
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: orders-payment-call
namespace: payments
spec:
selector:
matchLabels:
app: payments
action: ALLOW
rules:
- from:
- source:
# Restrict the calling workload (mTLS-based service identity)
principals: ['cluster.local/ns/orders/sa/orders-sa']
# Also require an end-user token
requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
to:
- operation:
methods: ['POST']
paths: ['/payments']
when:
- key: request.auth.claims[scope]
values: ['*payments:write*']
This single policy expresses: "allow only when a workload running as the orders service account, carrying a valid user token with the payments:write scope, calls POST /payments." Dual verification of service identity (mTLS) and user identity (JWT).
The Performance Perspective
General observations on JWT validation cost (absolute numbers are environment-dependent — benchmark yourself):
- RS256 signature verification costs tens of microseconds per request, with essentially no impact on p50 latency. ES256/EdDSA verify faster with shorter keys and are preferred for new builds in 2026 (Keycloak 26.6 supports EdDSA).
- The real cost is not the signature math but the moment a JWKS fetch lands on the request path. The key is removing it from the request path with async_fetch and caching.
- The added latency of second-pass sidecar validation is typically under 1ms per hop — acceptable relative to the value of defense in depth.
- Token size is the overlooked cost. Stuffing every group and permission into claims until the token exceeds 8KB causes 4xx errors from header limits — a common incident. Keep claims thin and identifier-oriented; the trend is delegating fine-grained permissions to a dedicated authorization service such as OpenFGA.
Troubleshooting — A 401 Debugging Flowchart
A systematic procedure for narrowing down gateway 401s.
+--------------------------+
| 401 received |
+-----------+--------------+
|
Is a token present on the request? (check headers)
|
+------------ no -----+---- yes ----------+
| |
Is a client/proxy dropping the Decode the token (inspect, not verify)
Authorization header? (check proxy chain) |
+------------------+------------------+
| | |
Does iss match Is exp in the Does aud match
config exactly? past? (incl. the config?
| clock skew) |
Mismatch: check | Mismatch: redesign
trailing slash, Expired: check the audience
http/https, refresh logic, mapping
realm path verify NTP
|
All fine? → suspect the key verification stage
|
+------------------+-------------------+
| |
Does the token header kid exist Can the gateway actually
in the JWKS? (curl the JWKS) fetch the JWKS?
| |
Missing: cache issue right after Fetch failing: check cluster
key rotation → review cache TTL definition, DNS, egress
and rotation grace policy policy, TLS trust chain
|
Everything fine but still 401 → check whether
RequestAuthentication passes and AuthorizationPolicy
denies (might be 403), and inspect rule matching
(paths/methods/scopes)
A companion set of diagnostic commands:
# 1) Inspect the token payload (decode without verifying)
TOKEN=eyJhbGciOi...
echo "$TOKEN" | cut -d. -f2 | tr '_-' '/+' | base64 -d 2>/dev/null | jq .
# 2) Query the JWKS directly — list kids
curl -s https://keycloak.example.com/realms/prod/protocol/openid-connect/certs | jq '.keys[].kid'
# 3) Confirm Istio config reached Envoy
istioctl proxy-config listener deploy/istio-ingressgateway -n istio-system -o json \
| jq '.. | select(.name? == "envoy.filters.http.jwt_authn")'
# 4) Check denial flags in Envoy access logs
kubectl logs deploy/istio-ingressgateway -n istio-system | grep -E '401|403' | tail -5
# 5) Use jwt_authn stats to see which stage is blocking
kubectl exec deploy/istio-ingressgateway -n istio-system -- \
pilot-agent request GET stats | grep -E 'jwt_authn|jwks'
The 401-versus-403 distinction matters too. In Istio, 401 comes from RequestAuthentication (a problem with the token itself), 403 from AuthorizationPolicy (token valid, permissions insufficient). Use it as the first branch in your debugging.
Closing Thoughts
OIDC token validation at the API gateway layer has converged on the principle "filter at the edge, validate again in the service," on the shared foundation of Envoy jwt_authn. To summarize:
- RequestAuthentication blocks nothing. It is one half of a set completed by AuthorizationPolicy.
- Availability is decided by JWKS caching design. Design async fetch, TTLs, and key rotation grace periods as one unit.
- Keep audiences narrow and tokens thin, and consider token exchange when crossing trust boundaries.
- mTLS (service identity) and JWT (user identity) must be combined for Zero Trust to be complete.
In the next post we go deep on the mTLS half of this story: SPIFFE/SPIRE-based workload identity.
References
- Envoy jwt_authn filter documentation
- Istio security concepts
- Istio RequestAuthentication reference
- Istio AuthorizationPolicy reference
- Kubernetes Gateway API
- Envoy Gateway SecurityPolicy documentation
- RFC 7519 — JSON Web Token (JWT)
- RFC 8693 — OAuth 2.0 Token Exchange
- RFC 9700 — Best Current Practice for OAuth 2.0 Security
- RFC 8725 — JWT Best Current Practices
- OpenID Connect Core 1.0
- OAuth 2.1 draft (draft-ietf-oauth-v2-1)
- Keycloak documentation
- Apache APISIX openid-connect plugin
- Kong OpenID Connect plugin
- OpenFGA documentation