Skip to content
Published on

Keycloak HA on Kubernetes — Infinispan Clustering and Zero-Downtime Deployments

Authors

Introduction

An SSO server is the single entry point every service in your organization depends on. If Keycloak goes down, logins stop — and when logins stop, you effectively have a company-wide outage. That is why high availability (HA) has always been the central challenge of operating Keycloak.

Fortunately, Keycloak 26.x in 2026 has dramatically lowered the difficulty of HA operations. With persistent user sessions now the default, the "restart logs out every user" problem is gone, and the 26.6 zero-downtime rolling patch officially supports uninterrupted deployments for patch upgrades. This article covers the full lifecycle of building and operating a Keycloak HA cluster on Kubernetes.

  • Choosing between the Operator and Helm for deployment
  • The Infinispan cache structure and JGroups DNS_PING discovery
  • What persistent user sessions mean and how they behave
  • Database selection, connection pools, and settling the sticky session debate
  • Multi-site (cross-DC) Active-Active topology
  • Resource sizing, JVM tuning, health checks
  • A failure-scenario response playbook

Choosing a Deployment Method — Operator vs Helm

The two main ways to run Keycloak on Kubernetes are the official Operator and community Helm charts (mainly Bitnami or codecentric).

AspectKeycloak Operator (official)Helm chart (community)
Maintained byThe Keycloak projectCommunity/vendors
Abstraction levelDeclared via the Keycloak CRFine-grained control via values.yaml
26.6 rolling patch automationSupported (update strategy)Manual configuration required
Realm importKeycloakRealmImport CRInit scripts
Custom imagesSupported (recommended pattern)Supported
Fine pod-level controlLimited (supplemented via podTemplate)Unrestricted
Recommended forStandard topologies, operations automationNon-standard topologies, existing Helm pipelines

For greenfield deployments we recommend the official Operator, because version upgrade automation and the 26.6 zero-downtime patch strategy are built into it. Installing the Operator looks like this.

kubectl create namespace keycloak
kubectl apply -n keycloak \
  -f https://raw.githubusercontent.com/keycloak/keycloak-k8s-resources/26.6.2/kubernetes/keycloaks.k8s.keycloak.org-v1.yml
kubectl apply -n keycloak \
  -f https://raw.githubusercontent.com/keycloak/keycloak-k8s-resources/26.6.2/kubernetes/keycloakrealmimports.k8s.keycloak.org-v1.yml
kubectl apply -n keycloak \
  -f https://raw.githubusercontent.com/keycloak/keycloak-k8s-resources/26.6.2/kubernetes/kubernetes.yml

A real-world example of the Keycloak CR.

apiVersion: k8s.keycloak.org/v2alpha1
kind: Keycloak
metadata:
  name: keycloak
  namespace: keycloak
spec:
  instances: 3
  image: registry.example.com/idp/keycloak-custom:26.6.2
  startOptimized: true
  db:
    vendor: postgres
    host: keycloak-db.database.svc.cluster.local
    port: 5432
    database: keycloak
    usernameSecret:
      name: keycloak-db-secret
      key: username
    passwordSecret:
      name: keycloak-db-secret
      key: password
    poolMinSize: 10
    poolInitialSize: 10
    poolMaxSize: 30
  hostname:
    hostname: sso.example.com
    strict: true
  http:
    httpEnabled: true
  proxy:
    headers: xforwarded
  additionalOptions:
    - name: log-console-output
      value: json
    - name: event-metrics-user-enabled
      value: "true"
  resources:
    requests:
      cpu: "1"
      memory: 1500Mi
    limits:
      memory: 3Gi
  update:
    strategy: Auto

The Infinispan Cache Structure — Where Sessions and Auth State Live

The heart of a Keycloak cluster is the embedded Infinispan cache. All cross-node state sharing happens here. Classifying the major caches:

Cache nameTypePurposeDefault behavior in 26+
realms, userslocalRead cache for DB entitiesLocal per node, synced via invalidation messages
authorizationlocalAuthorization policy cacheLocal per node
sessions, clientSessionsdistributedLogin sessionsDB persistence + cache
offlineSessionsdistributedOffline sessionsDB persistence + cache
authenticationSessionsdistributedIn-progress authentication (login form stage)Distributed across the cluster
loginFailuresdistributedBrute-force countersDistributed across the cluster
workreplicatedPropagating invalidations across nodesReplicated to all nodes
actionTokensdistributedOne-time tokens (email links, etc.)Distributed across the cluster

Pictured as a diagram:

        +-----------------+   +-----------------+   +-----------------+
        |  Keycloak Pod 1 |   |  Keycloak Pod 2 |   |  Keycloak Pod 3 |
        |                 |   |                 |   |                 |
        | local: realms,  |   | local: realms,  |   | local: realms,  |
        |        users    |   |        users    |   |        users    |
        |                 |   |                 |   |                 |
        | distributed:    |   | distributed:    |   | distributed:    |
        |  sessions(o2)  <----> sessions(o2)   <----> sessions(o2)    |
        |  authSessions   |   |  authSessions   |   |  authSessions   |
        |                 |   |                 |   |                 |
        | replicated:     |   | replicated:     |   | replicated:     |
        |  work          <----> work           <----> work            |
        +--------+--------+   +--------+--------+   +--------+--------+
                 |                     |                     |
                 +----------+----------+----------+----------+
                            |   JGroups (gossip)  |
                            v                     v
                     +-------------+      +--------------+
                     | PostgreSQL  |      | DNS headless |
                     | (session    |      | service      |
                     |  persistence)|     | (DNS_PING)   |
                     +-------------+      +--------------+

Distributed caches default to an owners count of 2, meaning each entry is replicated to two nodes. So losing one node does not lose session data. Losing two nodes at once can lose cache-resident data, but since 26, sessions are also persisted to the DB, making recovery possible.

JGroups DNS_PING — Node Discovery on Kubernetes

Cluster membership for Infinispan is handled by JGroups. Since multicast is blocked on Kubernetes, DNS-based discovery (DNS_PING) is used. The mechanism is simple.

  1. A headless Service exposes the IPs of all Keycloak pods as DNS A records
  2. Each node queries that DNS name at startup to obtain the peer list
  3. JGroups forms the cluster over port 7800

The Operator configures this automatically, but a manual setup looks like this.

apiVersion: v1
kind: Service
metadata:
  name: keycloak-discovery
  namespace: keycloak
spec:
  clusterIP: None
  publishNotReadyAddresses: true
  selector:
    app: keycloak
  ports:
    - name: jgroups
      port: 7800
      targetPort: 7800
# Keycloak startup options (as StatefulSet/Deployment environment variables)
KC_CACHE=ispn
KC_CACHE_STACK=kubernetes
JAVA_OPTS_APPEND=-Djgroups.dns.query=keycloak-discovery.keycloak.svc.cluster.local

The reason for enabling publishNotReadyAddresses is that pods must join the cluster even before passing readiness, so that session rebalancing during startup works correctly. Verify cluster formation in the logs.

kubectl logs -n keycloak keycloak-0 | grep "ISPN000094"
# ISPN000094: Received new cluster view ... (3) [keycloak-0-..., keycloak-1-..., keycloak-2-...]

Since 26.x, TLS encryption for JGroups traffic is enabled by default (with Operator deployments), so inter-node session data does not flow in plaintext.

Persistent User Sessions — The Game Changer in 26

Up to Keycloak 24, online sessions were purely in-memory (Infinispan). A full restart or simultaneous multi-node failure logged out every user. Starting with Keycloak 26, the persistent-user-sessions feature is enabled by default, changing the following.

  • Every user session / client session is written to the DB at creation time.
  • Infinispan is demoted to a hot-data cache role; the source of truth becomes the DB.
  • Users remain logged in even after a full cluster restart.
  • Memory usage drops significantly (no need to hold all sessions in memory).

The price is increased DB write load. Every login/logout/refresh incurs a DB write, so factor login-burst scenarios (the 9 a.m. rush) into your DB IOPS sizing. Disabling is possible (by excluding the feature), but the 26 operational model is designed around persistent sessions, so we recommend keeping the default unless you have a specific reason not to.

Database Selection and Connection Pools

AspectRecommendationRationale
DB enginePostgreSQL 15+Basis of official performance tests; Aurora PostgreSQL validated
Isolation levelREAD COMMITTEDThe default; no change needed
Pool sizeAround max 30 per nodeOversized pools only add DB load
HAPatroni / RDS Multi-AZ / AuroraKeep the DB from being a SPOF
Pool sizingBased on peak concurrent requestsConsider login TPS x average queries

There is no simple connection pool formula, but as a rule of thumb start at "10-15 pool connections per node per 100 login TPS" and adjust with monitoring (agroal metrics). Pool exhaustion translates directly into login failures, not just latency, so alert on db-pool metrics.

# Connection pool section of the Keycloak CR
  db:
    poolMinSize: 10
    poolInitialSize: 10
    poolMaxSize: 30
  additionalOptions:
    - name: transaction-xa-enabled
      value: "false"

Are Sticky Sessions Necessary?

Bottom line: not mandatory as of 26, but still beneficial.

  • authenticationSessions (login progress state) live in a distributed cache, so any node can handle any request.
  • However, consistently routing to the same node increases the probability of hitting the owner node directly, reducing inter-node RPCs and improving latency.
  • Keycloak encodes node information in the AUTH_SESSION_ID cookie, and load balancers that honor it (such as ingress-nginx session affinity) naturally behave sticky.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: keycloak
  namespace: keycloak
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "KC_ROUTE"
    nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
spec:
  ingressClassName: nginx
  rules:
    - host: sso.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: keycloak-service
                port:
                  number: 8080
  tls:
    - hosts: [sso.example.com]
      secretName: sso-tls

Increasing proxy-buffer-size is an essential field tip. Keycloak response headers (especially token-bearing redirects) commonly exceed the default buffer and cause 502 errors.

Multi-Site (Cross-DC) Active-Active

The official multi-site architecture in 26.x supports two-site Active-Active. The key components:

        Site A (eu-west-1)                  Site B (eu-central-1)
   +------------------------+         +------------------------+
   |  Keycloak (3 pods)     |         |  Keycloak (3 pods)     |
   |        |               |         |        |               |
   |  Infinispan (external) | <-----> |  Infinispan (external) |
   |  cross-site replication|  RELAY2 |  cross-site replication|
   +-----------+------------+         +-----------+------------+
               |                                  |
               +----------------+-----------------+
                                |
                     +----------v-----------+
                     |  Aurora Global DB    |
                     |  (writer: Site A)    |
                     +----------------------+
                                ^
               +----------------+----------------+
               |     Global LB (Route53 /        |
               |     health-check based failover)|
               +---------------------------------+
  • Session synchronization: cross-site replication (RELAY2) of an external Infinispan cluster
  • DB: a single-writer global database such as Aurora Global Database
  • Routing: a global LB distributes traffic to both sites based on health checks
  • Thanks to persistent user sessions, recovery via the DB is possible even when cross-site cache sync fails

Multi-site carries very high operational complexity, so adopt it only when your RTO/RPO requirements genuinely demand it; first evaluate whether a single region with multiple AZs plus robust backup/restore procedures is sufficient. See the official HA guide for details.

The 26.6 Zero-Downtime Rolling Patch

Before 26.6, the default for any version upgrade was a recreate strategy — bringing the whole cluster down and back up — due to potential cache protocol incompatibility. Starting with 26.6, compatibility is guaranteed between patch releases (e.g., 26.6.0 to 26.6.2) and rolling updates are officially supported.

# Keycloak CR
spec:
  update:
    strategy: Auto   # auto-detect compatibility: rolling if possible, otherwise recreate

The Auto strategy works as follows.

  1. The Operator runs an update-compatibility check job against the new image
  2. If caches/config are compatible, pods are replaced one at a time (zero downtime)
  3. If incompatible, recreate (full restart; logins survive thanks to persistent sessions)

You can also check compatibility manually.

# Generate metadata on the current version
bin/kc.sh update-compatibility metadata --file=/tmp/metadata.json

# Check on the new version
bin/kc.sh update-compatibility check --file=/tmp/metadata.json
echo $?   # 0 means rolling is possible

Resource Sizing and JVM Tuning

Starting points based on the official sizing guide:

Load metricApprox. throughput per vCPUNotes
Password loginsAround 15 per secondHeavily dependent on hash cost (argon2)
Client credentials grantsAround 120 per secondThe lightest operation
Refresh tokensAround 120 per secondIncludes DB writes
Memory (incl. non-heap)1.25-3Gi per podRealm/client count matters more than session count

Since 26, JVM memory defaults to ratio-based sizing from container memory (70% heap by default). To control it explicitly:

  additionalOptions: []
  # or via environment variables
  # JAVA_OPTS_KC_HEAP: "-XX:MaxRAMPercentage=70 -XX:InitialRAMPercentage=50"
  resources:
    requests:
      cpu: "1"
      memory: 1500Mi
    limits:
      memory: 3Gi

The common recommendation is not to set a CPU limit (avoiding latency spikes from throttling). Set the memory limit with headroom above heap + metaspace + native to prevent OOMKills.

Health Checks and the Startup Probe

Keycloak serves health endpoints on the management port (9000 by default).

# When configuring a Deployment/StatefulSet directly
livenessProbe:
  httpGet:
    path: /health/live
    port: 9000
  periodSeconds: 10
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /health/ready
    port: 9000
  periodSeconds: 10
  failureThreshold: 3
startupProbe:
  httpGet:
    path: /health/started
    port: 9000
  periodSeconds: 5
  failureThreshold: 60   # up to 5 minutes of startup grace
  • started: signals startup completion. Dedicated to the startup probe; give failureThreshold generous headroom for upgrades with long migrations.
  • ready: includes DB connectivity. Be aware that a brief DB blip can flip all pods to not-ready simultaneously, looking like a full outage.
  • live: process survival itself. Failures trigger restarts, so configure conservatively.

Additionally, PodDisruptionBudget and topologySpreadConstraints are HA fundamentals.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: keycloak-pdb
  namespace: keycloak
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: keycloak

Failure Scenario Playbook

Scenario 1: One node down

  • Symptoms: almost none. Sessions preserved by distributed cache owners 2; LB redistributes traffic.
  • Response: confirm automatic pod recreation. Check the logs that the node rejoined the JGroups cluster view.

Scenario 2: DB blip (30-second failover)

  • Symptoms: readiness fails on all nodes; logins/token issuance fail entirely. Validation of already-issued tokens is barely affected (signature verification is local).
  • Response: verifying DB failover automation is the top priority. Keycloak recovers automatically when the DB returns, so pod restarts are unnecessary. Rather, take care not to configure liveness so aggressively that you trigger a restart storm.

Scenario 3: Split brain (network partition)

  • Symptoms: the cluster splits into two groups, each forming its own view. Brute-force counters/sessions may diverge.
  • Response: the 26 defaults recover via a MERGE event when partitions heal. Thanks to persistent sessions, session data converges on the DB. If partitions are frequent, the root fix is inspecting CNI/node networking.

Scenario 4: Full restart (disaster recovery)

  • Symptoms: before 26 this meant logging out every user; with 26+ sessions restore from the DB and logins survive.
  • Response: for the worst case of restoring from DB backups, keep separate realm exports (double backup of config and data).
# Periodic realm export (automating with a CronJob is recommended)
bin/kc.sh export --dir /tmp/export --realm production --users different_files

Scenario 5: Login burst (morning rush spike)

  • Symptoms: CPU saturation; password hashing dominates.
  • Response: scale horizontally with HPA — since hash cost dominates CPU, scaling out helps directly. But recompute pool maximums so the total DB connections do not exceed the DB limit.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: keycloak-hpa
  namespace: keycloak
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: keycloak
  minReplicas: 3
  maxReplicas: 8
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Conclusion

In the Keycloak 26 era, HA operations have simplified from "somehow appeasing Infinispan" to "operating an ordinary stateful service centered on the DB." Persistent user sessions and the zero-downtime rolling patch are the turning points. Even so, fundamentals like JGroups discovery, connection pool sizing, and probe tuning remain the operator's responsibility. Use the YAML examples in this article as starting points, and always validate the numbers with load tests in your own environment.

In the next article, we cover SPI development (custom Authenticators, EventListeners) — extending Keycloak's functionality itself.

References