OIDC Token Validation at the API Gateway — Istio, Envoy, and Gateway API in Practice

Introduction — Where Should Token Validation Happen
Validating at the Edge vs in the Service
The Envoy jwt_authn Filter in Detail
Istio — RequestAuthentication + AuthorizationPolicy
JWKS Caching and Failure Modes
Audience Strategy
Token Propagation — Relaying the Original Token vs Token Exchange
Kong vs APISIX — Comparing the OIDC Plugins
Auth Standardization in the Gateway API Era
Combining mTLS and JWT — Service Identity and User Identity
The Performance Perspective
Troubleshooting — A 401 Debugging Flowchart
Closing Thoughts
References

Introduction — Where Should Token Validation Happen

One of the most recurring security debates in microservice architecture is "where do we validate JWTs?" Only once at the gateway? In every service? Both? As of 2026 the industry consensus is fairly clear: filter at the edge, validate again in the service (defense in depth). And the de facto standard implementation vehicle is the Envoy-based stack (Istio, Envoy Gateway, Gloo, and friends).

In this post we understand Envoy's jwt_authn filter from the ground up, then explore the Istio RequestAuthentication + AuthorizationPolicy combination with plenty of YAML. We tackle operational hard problems — JWKS caching and failure modes, audience strategy, token propagation patterns including RFC 8693 Token Exchange — then compare the OIDC plugins of Kong and APISIX, survey auth standardization in the Gateway API era, and combine mTLS with JWT. We close with an ASCII flowchart for debugging 401s.

Validating at the Edge vs in the Service

Let us start with the trade-offs.

Aspect	Edge (gateway) only	Each service only
Performance	One validation, free inside	Validation cost repeated per hop
Consistency	Central policy, one place to config	Per-service library/config fragmentation
Internal compromise	Defenseless if gateway is bypassed	Internal traffic also validated — robust
Claim usage	Must forward via headers (forgery risk)	Services access claims directly
Operational burden	Low	Library versions, JWKS handling scattered

The conclusion is to combine both. The recommended practical pattern:

Edge (gateway): validate the signature, issuer (iss), expiry (exp), and audience (aud), rejecting bad traffic early. Expensive internal resources are not wasted on garbage tokens.
Service (sidecar or library): repeat the same validation, then add per-service audience checks and fine-grained authorization (scopes, roles). Even if the gateway is breached or a forged internal call arrives, you are defended.
Service-to-service trust via mTLS: independent of the user token, the calling service proves its identity via mTLS (SPIFFE and friends). More on this below.

            [Edge: first-pass validation - signature/iss/exp/aud]
  Client ──> API Gateway (Envoy jwt_authn) ──┐
                                             │ mTLS (service identity)
                                             v
                              [Service: second validation + fine-grained authz]
                              Service A (sidecar RequestAuthentication)
                                             │ token relay or exchange
                                             v
                              Service B (sidecar + AuthorizationPolicy)

The Envoy jwt_authn Filter in Detail

Whether you run Istio, Envoy Gateway, or certain Kong modes, the thing actually validating JWTs at the bottom is Envoy's HTTP filter jwt_authn. Understand it and you understand the behavior and failure modes of every higher-level abstraction.

There are two core concepts.

providers — the definition of "whose tokens, with which keys, validated how." You specify issuer, audiences, the JWKS source, where to extract the token, and how to pass the payload along.
rules — the mapping of "which routes require which provider." You point at a provider via requires, with relaxation modes like allow_missing and allow_missing_or_failed.

A complete configuration example:

http_filters:
  - name: envoy.filters.http.jwt_authn
    typed_config:
      '@type': type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
      providers:
        keycloak_provider:
          issuer: https://keycloak.example.com/realms/prod
          audiences:
            - orders-api
          remote_jwks:
            http_uri:
              uri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
              cluster: keycloak_jwks_cluster
              timeout: 3s
            cache_duration: 600s
            async_fetch:
              fast_listener: false
            retry_policy:
              num_retries: 3
          # Extract from the Authorization: Bearer header (default)
          from_headers:
            - name: Authorization
              value_prefix: 'Bearer '
          # Store the verified payload in metadata for later filters (RBAC etc.)
          payload_in_metadata: jwt_payload
          # Forward selected claims upstream as plain headers
          claim_to_headers:
            - header_name: x-jwt-sub
              claim_name: sub
            - header_name: x-jwt-scope
              claim_name: scope
          # Keep or strip the original token toward the upstream
          forward: true
          # Allowed clock skew for exp validation
          clock_skew_seconds: 30
      rules:
        # Health checks need no token
        - match:
            prefix: /healthz
        # Public docs: validate if a token is present, pass if absent
        - match:
            prefix: /docs
          requires:
            requires_any:
              requirements:
                - provider_name: keycloak_provider
                - allow_missing: {}
        # Everything else is mandatory
        - match:
            prefix: /
          requires:
            provider_name: keycloak_provider

Per-setting caveats:

issuer must match the token's iss claim exactly, string for string. A single trailing slash difference produces 401s.
The cluster referenced by remote_jwks must be defined separately. Envoy abstracts even the JWKS endpoint as a cluster; missing DNS/TLS configuration means keys cannot be fetched and every request becomes a 401.
Enabling async_fetch pre-fetches the JWKS at listener startup and refreshes it in the background, reducing first-request latency and the blast radius of brief JWKS endpoint outages.
Without forward: true, the Authorization header may be stripped toward the upstream by default (behavior varies by configuration lineage). Be explicit if you need token propagation.
claim_to_headers produces plain headers. Before trusting them, the upstream must be guaranteed — at the network level — that traffic cannot reach it without passing through Envoy.

Istio — RequestAuthentication + AuthorizationPolicy

Istio abstracts jwt_authn behind the RequestAuthentication CRD and authorization behind the AuthorizationPolicy CRD. One crucial fact first:

RequestAuthentication alone blocks nothing. It only means "if a token is present, validate it" — requests with no token at all pass straight through. To actually block, you must declare via AuthorizationPolicy that only requests with a valid subject (requestPrincipals) are allowed. Real-world security incidents caused by this trap are common.

First-pass validation at the ingress gateway:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: ingress-jwt
  namespace: istio-system
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  jwtRules:
    - issuer: https://keycloak.example.com/realms/prod
      jwksUri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
      audiences:
        - api-gateway
      forwardOriginalToken: true
      outputClaimToHeaders:
        - header: x-jwt-sub
          claim: sub
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: ingress-require-jwt
  namespace: istio-system
spec:
  selector:
    matchLabels:
      istio: ingressgateway
  action: DENY
  rules:
    - from:
        - source:
            notRequestPrincipals: ['*']
      to:
        - operation:
            notPaths: ['/healthz', '/metrics']

The DENY + notRequestPrincipals pattern is the standard idiom for "reject any request without a valid token subject." The value of requestPrincipals takes the form of iss and sub joined by a slash.

At the service layer we apply finer authorization. Expressing "write operations on the orders service require the orders:write scope":

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: orders-jwt
  namespace: orders
spec:
  selector:
    matchLabels:
      app: orders
  jwtRules:
    - issuer: https://keycloak.example.com/realms/prod
      jwksUri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
      audiences:
        - orders-api
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: orders-authz
  namespace: orders
spec:
  selector:
    matchLabels:
      app: orders
  action: ALLOW
  rules:
    # Reads: any authenticated subject
    - from:
        - source:
            requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
      to:
        - operation:
            methods: ['GET']
            paths: ['/orders', '/orders/*']
    # Writes: require the orders:write scope
    - from:
        - source:
            requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
      to:
        - operation:
            methods: ['POST', 'PUT', 'DELETE']
            paths: ['/orders', '/orders/*']
      when:
        - key: request.auth.claims[scope]
          values: ['*orders:write*']

Also remember: the moment any ALLOW policy exists on a workload, the behavior flips to "deny everything that does not match a rule." Adding a policy switches the default to deny-by-default.

JWKS Caching and Failure Modes

The availability of a JWT validation system is decided by how the JWKS endpoint is managed. Failure scenarios, in a table:

Scenario	Symptom	Mitigation
Brief JWKS endpoint outage	Fetch fails after cache expiry → mass 401	async_fetch + generous cache TTL, IdP redundancy
Right after key rotation	Tokens with new kid hit cache miss → 401	IdP publishes new key, waits a grace period before signing
IdP deletes the old key immediately	All existing tokens 401	Retire keys no sooner than the max token lifetime
Gateway restart while IdP is down	Initial JWKS fetch fails	Decide fail-open policy; fall back to a local JWKS file
Clock skew	Intermittent exp/nbf validation failures	Set clock_skew_seconds + monitor NTP

Operational recommendations:

Design the cache TTL together with the IdP key rotation schedule. Example: publish the new key 24 hours before rotation, cache for 10 minutes — even stale caches never break validation.
Keep Envoy's local_jwks (file-based) as an emergency fallback so validation continues with existing keys during a total IdP outage. The trade-off: revoked keys may live longer.
Expose JWKS fetch failure rate, cache hit rate, and 401 ratio as metrics and alert on them. Envoy provides jwt_authn statistics (denied, jwks_fetch_failed, and more).

Audience Strategy

The aud claim declares "who this token is for." There are three strategic options.

Single audience (one for the gateway) — simple to implement, but the token is reusable against any service, so the blast radius of token theft is large.
Per-service audiences — each service accepts only its own audience. Safest, but clients need different tokens per service, and service-to-service calls require token exchange.
Tiered (the realistic compromise) — split audiences per externally exposed API, and handle internal granularity with scopes. The gateway validates a broad audience; each service validates the audience of its API plus scopes.

Recommended principles:

At minimum, avoid the "any token from this org passes" state of unvalidated audiences. RFC 9700 (OAuth Security BCP) names audience restriction as a key mitigation.
Scope access token audiences to resource servers; never use ID tokens (aud = the client) for API calls. Sending ID tokens to APIs is a common anti-pattern.

Token Propagation — Relaying the Original Token vs Token Exchange

When service A receives a user request and calls service B, how do you carry the user context across?

Pattern 1: relay the original token (token relay)

Client --(JWT aud=api)--> Gateway --(same JWT)--> Service A --(same JWT)--> Service B

Pros: simple, no extra IdP round trips.
Cons: the audience must be broad, so token theft endangers every service. For the token's lifetime, B can impersonate A against other services. No delegation information, so audit trails are incomplete.

Pattern 2: Token Exchange (RFC 8693)

Service A presents the received token to the IdP and exchanges it for a new token whose audience is narrowed to B.

curl -s -X POST https://keycloak.example.com/realms/prod/protocol/openid-connect/token \
  -d grant_type=urn:ietf:params:oauth:grant-type:token-exchange \
  -d client_id=service-a \
  -d client_secret=SERVICE_A_SECRET \
  -d subject_token=ORIGINAL_USER_ACCESS_TOKEN \
  -d subject_token_type=urn:ietf:params:oauth:token-type:access_token \
  -d audience=service-b

The resulting token carries sub (the original user) plus an act (actor) claim recording the delegation chain — "service-a acting on behalf of the user."

{
  "iss": "https://keycloak.example.com/realms/prod",
  "sub": "user-1234",
  "aud": "service-b",
  "scope": "orders:read",
  "act": {
    "sub": "service-account-service-a"
  },
  "exp": 1781234567
}

Pros: minimal audiences, preserved delegation chains, the ability to scope down. The same mechanism powers delegation tracking in the AI agent era.
Cons: an IdP round trip per hop (caching is mandatory), and the burden of managing exchange policy on the IdP.

The practical compromise: "exchange only where a trust boundary is crossed (between domains, entering sensitive services); within the same trust boundary, relay + mTLS." Note that the transaction tokens work standardizing this flow is in progress at the OAuth WG; we revisit it together with workload identity in the next post (SPIFFE/SPIRE).

Kong vs APISIX — Comparing the OIDC Plugins

Comparing the OIDC handling of the two most-used gateways outside the Envoy family.

Aspect	Kong (openid-connect plugin)	APISIX (openid-connect / authz-keycloak)
Foundation	nginx/OpenResty + lua-resty-openidc	nginx/OpenResty + lua-resty-openidc
Licensing	OIDC plugin is Enterprise	Included in OSS
Modes	Validation (JWT), session (cookie), relying party	Validation, relying party, Keycloak authz
JWKS caching	Built-in, discovery cache	Built-in, discovery cache
Fine-grained authz	Combine ACL/scope plugins	Delegate UMA permission checks via authz-keycloak
Declarative mgmt	decK, Kong CRDs	APISIX CRDs, ADC

An APISIX configuration example:

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: orders-route
  namespace: apps
spec:
  http:
    - name: orders
      match:
        hosts:
          - api.example.com
        paths:
          - /orders/*
      backends:
        - serviceName: orders
          servicePort: 8080
      plugins:
        - name: openid-connect
          enable: true
          config:
            discovery: https://keycloak.example.com/realms/prod/.well-known/openid-configuration
            client_id: apisix-gateway
            client_secret: GATEWAY_CLIENT_SECRET
            bearer_only: true
            use_jwks: true
            token_signing_alg_values_expected: RS256
            audience: orders-api

bearer_only: true is the API-gateway mode that "validates only Bearer tokens, no browser redirects." To have the gateway also handle web app sessions, turn bearer_only off and use relying-party mode (at which point the gateway starts resembling an IAP).

Auth Standardization in the Gateway API Era

The Kubernetes Gateway API — the successor to Ingress — standardized routing, but authentication and authorization long remained the territory of implementation-specific extensions (policy CRDs). The 2025-2026 trajectory:

With Gateway API 1.4, the Policy Attachment pattern (BackendTLSPolicy and friends) became established, and standardization of auth filters (HTTPRoute-level JWT/extAuth filters) is progressing as GEPs (Gateway Enhancement Proposals).
Until then, practice means per-implementation policy CRDs. Envoy Gateway's SecurityPolicy is the canonical example.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: orders-jwt
  namespace: apps
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: orders-route
  jwt:
    providers:
      - name: keycloak
        issuer: https://keycloak.example.com/realms/prod
        audiences:
          - orders-api
        remoteJWKS:
          uri: https://keycloak.example.com/realms/prod/protocol/openid-connect/certs
        claimToHeaders:
          - claim: sub
            header: x-jwt-sub

Even on the same Envoy base, Istio uses its own CRDs, Envoy Gateway uses SecurityPolicy, and Gloo uses yet another CRD — fragmentation is today's reality. But since they all sit on the jwt_authn filter underneath, the first half of this post applies to every implementation. Long term, attaching auth filters to HTTPRoute with standard syntax is the likely direction.

Combining mTLS and JWT — Service Identity and User Identity

mTLS and JWT are not competitors; they are orthogonal mechanisms answering different questions.

mTLS (peer identity) — "which workload sent this request?" In Istio it is enforced with PeerAuthentication, and the identity is expressed as a SPIFFE-format principal.
JWT (request identity) — "which end user does this request represent?"

An AuthorizationPolicy combining both is the standard form of Zero Trust microservices.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: orders
spec:
  mtls:
    mode: STRICT
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: orders-payment-call
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
    - from:
        - source:
            # Restrict the calling workload (mTLS-based service identity)
            principals: ['cluster.local/ns/orders/sa/orders-sa']
            # Also require an end-user token
            requestPrincipals: ['https://keycloak.example.com/realms/prod/*']
      to:
        - operation:
            methods: ['POST']
            paths: ['/payments']
      when:
        - key: request.auth.claims[scope]
          values: ['*payments:write*']

This single policy expresses: "allow only when a workload running as the orders service account, carrying a valid user token with the payments:write scope, calls POST /payments." Dual verification of service identity (mTLS) and user identity (JWT).

The Performance Perspective

General observations on JWT validation cost (absolute numbers are environment-dependent — benchmark yourself):

RS256 signature verification costs tens of microseconds per request, with essentially no impact on p50 latency. ES256/EdDSA verify faster with shorter keys and are preferred for new builds in 2026 (Keycloak 26.6 supports EdDSA).
The real cost is not the signature math but the moment a JWKS fetch lands on the request path. The key is removing it from the request path with async_fetch and caching.
The added latency of second-pass sidecar validation is typically under 1ms per hop — acceptable relative to the value of defense in depth.
Token size is the overlooked cost. Stuffing every group and permission into claims until the token exceeds 8KB causes 4xx errors from header limits — a common incident. Keep claims thin and identifier-oriented; the trend is delegating fine-grained permissions to a dedicated authorization service such as OpenFGA.

Troubleshooting — A 401 Debugging Flowchart

A systematic procedure for narrowing down gateway 401s.

                         +--------------------------+
                         | 401 received             |
                         +-----------+--------------+
                                     |
                  Is a token present on the request? (check headers)
                                     |
              +------------ no -----+---- yes ----------+
              |                                         |
   Is a client/proxy dropping the           Decode the token (inspect, not verify)
   Authorization header? (check proxy chain)            |
                                     +------------------+------------------+
                                     |                  |                  |
                               Does iss match      Is exp in the      Does aud match
                               config exactly?     past? (incl.       the config?
                                     |             clock skew)             |
                              Mismatch: check          |             Mismatch: redesign
                              trailing slash,     Expired: check     the audience
                              http/https,         refresh logic,     mapping
                              realm path          verify NTP
                                     |
                         All fine? → suspect the key verification stage
                                     |
                  +------------------+-------------------+
                  |                                      |
        Does the token header kid exist        Can the gateway actually
        in the JWKS? (curl the JWKS)           fetch the JWKS?
                  |                                      |
        Missing: cache issue right after      Fetch failing: check cluster
        key rotation → review cache TTL       definition, DNS, egress
        and rotation grace policy             policy, TLS trust chain
                  |
        Everything fine but still 401 → check whether
        RequestAuthentication passes and AuthorizationPolicy
        denies (might be 403), and inspect rule matching
        (paths/methods/scopes)

A companion set of diagnostic commands:

# 1) Inspect the token payload (decode without verifying)
TOKEN=eyJhbGciOi...
echo "$TOKEN" | cut -d. -f2 | tr '_-' '/+' | base64 -d 2>/dev/null | jq .

# 2) Query the JWKS directly — list kids
curl -s https://keycloak.example.com/realms/prod/protocol/openid-connect/certs | jq '.keys[].kid'

# 3) Confirm Istio config reached Envoy
istioctl proxy-config listener deploy/istio-ingressgateway -n istio-system -o json \
  | jq '.. | select(.name? == "envoy.filters.http.jwt_authn")'

# 4) Check denial flags in Envoy access logs
kubectl logs deploy/istio-ingressgateway -n istio-system | grep -E '401|403' | tail -5

# 5) Use jwt_authn stats to see which stage is blocking
kubectl exec deploy/istio-ingressgateway -n istio-system -- \
  pilot-agent request GET stats | grep -E 'jwt_authn|jwks'

The 401-versus-403 distinction matters too. In Istio, 401 comes from RequestAuthentication (a problem with the token itself), 403 from AuthorizationPolicy (token valid, permissions insufficient). Use it as the first branch in your debugging.

Closing Thoughts

OIDC token validation at the API gateway layer has converged on the principle "filter at the edge, validate again in the service," on the shared foundation of Envoy jwt_authn. To summarize:

RequestAuthentication blocks nothing. It is one half of a set completed by AuthorizationPolicy.
Availability is decided by JWKS caching design. Design async fetch, TTLs, and key rotation grace periods as one unit.
Keep audiences narrow and tokens thin, and consider token exchange when crossing trust boundaries.
mTLS (service identity) and JWT (user identity) must be combined for Zero Trust to be complete.

In the next post we go deep on the mTLS half of this story: SPIFFE/SPIRE-based workload identity.