필사 모드: Refresh Token Rotation and Session Management — Designing a Theft-Resistant Token Lifecycle
EnglishIntroduction
The hardest question in an authentication system is not "how do we log users in" but "**how long, and how, do we keep them logged in**". Long token lifetimes improve user experience but magnify theft damage; short lifetimes are safer but unleash re-login hell. The standard tool for resolving this tension is the refresh token, and its safety mechanism is rotation with reuse detection.
In the context of 2026, this topic has become even more important. The OAuth 2.1 draft mandates rotation or sender-constraining for public clients' refresh tokens, and RFC 9700 (the OAuth Security BCP) defines the concrete implementation requirements. Meanwhile, as passkeys become the default authentication method, a reversal is taking place: "passwords are now secure, but sessions and tokens are the weak link". Attackers now target session cookies and refresh tokens instead of passwords — these are exactly what infostealer malware steals.
This article covers token lifetime design principles, the inner workings of the rotation mechanism, the three-layer model of IdP/app/SSO sessions, the details of Keycloak's session settings, and the response procedures when a token theft incident occurs.
Token Lifetime Design Principles
The division of labor between the two tokens
┌──────────────────────────────────────────────────────────┐
│ Access Token (AT) │
│ - Purpose: authorizing API calls │
│ - Lifetime: 5-15 minutes (keep it short!) │
│ - Verification: stateless (signature only, no IdP call) │
│ - If stolen: damage limited to its lifetime. │
│ Assume it cannot be revoked │
├──────────────────────────────────────────────────────────┤
│ Refresh Token (RT) │
│ - Purpose: obtaining new ATs (used only at the IdP │
│ token endpoint) │
│ - Lifetime: idle hours-weeks, max weeks-months │
│ - Verification: stateful (IdP checks session/family) │
│ - If stolen: detected/blocked via rotation + │
│ reuse detection │
└──────────────────────────────────────────────────────────┘
The design principles boil down to four points.
1. **Assume the AT "cannot be revoked" and limit damage via lifetime.** If an AT lives 5 minutes, the damage window after theft is at most 5 minutes.
2. **Assume the RT "will be stolen" and attach a detection mechanism.** Rotation + reuse detection is that detector.
3. **Scale lifetimes with risk.** There is no reason a financial service's RT and an internal wiki's RT should have the same lifetime.
4. **Separate idle and max.** "Expires after this much inactivity (idle)" and "expires at this point no matter what (max)" are different controls.
Recommended starting values
| Service type | AT | RT idle | RT max | Notes |
| --- | --- | --- | --- | --- |
| Finance/payments | 5 min | 30 min | 8 h | Frequent re-authentication |
| General B2C web/app | 10-15 min | 14 days | 90 days | Rotation mandatory |
| Internal business systems | 10 min | 8 h | 24 h | Working-day unit |
| Background sync (offline) | 10 min | 30 days | 180 days | Offline tokens, separate audit |
These values are starting points, not answers. The point is that you must be able to explain "why each value is what it is".
The Rotation + Reuse Detection Mechanism
Basic operation
Rotation means issuing a new RT on every use and invalidating the previous one. The RT becomes a single-use ticket.
Time Client IdP
t0 refresh with RT1 → issue AT2 + RT2, mark RT1 "used"
t1 refresh with RT2 → issue AT3 + RT3, mark RT2 "used"
t2 refresh with RT3 → issue AT4 + RT4, ...
Token families and reuse detection
The lineage of RTs derived from the same initial authentication is called a **token family**. Reuse detection operates at the family level.
Family F1: RT1 ──> RT2 ──> RT3 (currently valid)
Theft scenario A: the attacker uses it first
1. Attacker steals RT2 and refreshes → obtains RT3' (attacker is newest)
2. The legitimate user tries to refresh with RT2
3. IdP: "RT2 was already used" → reuse detected!
4. The entire family F1 is invalidated (RT3' dies too)
5. Both sides are logged out → the user re-authenticates,
the attacker is blocked
Theft scenario B: the legitimate user uses it first
1. Attacker steals RT2 (not yet used)
2. The user refreshes with RT2 → RT3 issued
3. The attacker tries RT2 → reuse detected! → family invalidated
Either way, the moment a "twice-used RT" is detected,
the whole family dies — that is the essence.
The beauty of this mechanism is that **you never need to distinguish the attacker from the legitimate user.** Without knowing who the thief is, you invalidate everything on collision and demand re-authentication. The legitimate user suffers a minor inconvenience (re-login); the attacker is blocked permanently.
Network errors and the grace period
In reality, a legitimate client can also send an RT "twice" — when it retries after failing to receive the response to a refresh request. For this, many implementations allow a short grace period (a duplicate-transmission window, usually seconds to tens of seconds). With `revokeRefreshToken` enabled, Keycloak controls the permitted reuse count via `refreshTokenMaxReuse` (default 0). Zero is the safest; making the client's retry logic idempotent is the proper path.
Keycloak: rotation + zero reuse allowed (strictest)
kcadm.sh update realms/myrealm \
-s revokeRefreshToken=true \
-s refreshTokenMaxReuse=0
The Three-Layer Session Model — IdP Session, App Session, SSO Session
If you look only at tokens, you see only half of session management. In a real system, three kinds of "logged-in state" coexist.
┌───────────────────────────────────────────────────────────────┐
│ Three-layer session model │
│ │
│ [Browser] │
│ ├── App session A (session cookie of app-a.example.com) │
│ ├── App session B (session cookie of app-b.example.com) │
│ └── IdP session (SSO cookie of idp.example.com) │
│ │ │
│ ▼ │
│ [IdP server] │
│ └── SSO session (server-side state: the ledger of which │
│ user authenticated to which clients, and when) │
│ ├── Client Session A (token issuance record for app-a) │
│ └── Client Session B (token issuance record for app-b) │
└───────────────────────────────────────────────────────────────┘
| Layer | Stored at | Lifetime owner | Effect of expiry |
| --- | --- | --- | --- |
| App session | Each app's cookie/session store | The app | Logout from that app only |
| IdP session (SSO cookie) | IdP domain cookie | The IdP | No new SSO logins; existing app sessions may survive |
| SSO session (server state) | IdP server/DB | The IdP | RT refresh fails; no new token issuance |
The most common confusion in practice is "I logged out — why am I still logged in to the other app?" The answer: these three layers expire independently. Logging out of app A (deleting app session A) still leaves the IdP session alive, so revisiting app A silently re-authenticates. Conversely, killing the IdP session does not stop app B if its own session cookie is alive. Cleaning this up requires the logout propagation discussed below.
Keycloak Session Settings in Detail
Keycloak implements this model as SSO Sessions, Client Sessions, and Offline Sessions, configured under the Sessions and Tokens tabs of Realm Settings.
SSO Session
SSO Session Idle: default 30 minutes
- The session expires if there is no session activity
(token refresh, etc.) for this duration
- Determines the effective idle lifetime of RTs
(RT lifetime = min(this value, client session value))
SSO Session Max: default 10 hours
- Regardless of activity, the session expires this long after
the initial authentication
- The basis of "no matter how diligently you refresh,
you re-login after 10 hours"
Client Session
Client Session Idle / Client Session Max: default 0 (= inherits SSO values)
- Use when you want to force shorter token lifetimes per client
- Example: realm-wide idle is 30 days, but the payment client
gets idle 30 minutes
- Can also be overridden per client under Advanced Settings
An important behavior: a refresh token's expiry is bound to **whichever ends first — the SSO session or the client session**. In Keycloak, RT lifetime is not an independent setting but a derivative of session lifetimes. Not knowing this leads to mysteries like "I increased the RT lifetime but it does not take effect".
Offline Session
A separate track for cases like background sync, where token refresh must continue after the user closes the browser. Requesting a scope that includes offline_access yields an offline token.
Offline Session Idle: default 30 days
- Using the offline RT at least once every 30 days keeps extending it
Offline Session Max Limited: when enabled, applies an absolute cap
Offline Session Max: default 60 days
- Absolute expiry regardless of refreshes
Offline tokens are managed separately from regular SSO sessions and survive user logout. Precisely because they are powerful, restrict who can obtain them (control client scopes) and audit them periodically via per-user Consents/Sessions in the Admin Console.
Token lifetime settings
Access Token Lifespan: default 5 minutes (do not increase)
Access Token Lifespan For Implicit: legacy, ignore
Client Login Timeout: allowed time for code → token exchange
An example configuration via kcadm:
kcadm.sh update realms/myrealm \
-s accessTokenLifespan=600 \
-s ssoSessionIdleTimeout=1209600 \
-s ssoSessionMaxLifespan=7776000 \
-s offlineSessionIdleTimeout=2592000 \
-s offlineSessionMaxLifespanEnabled=true \
-s offlineSessionMaxLifespan=15552000 \
-s revokeRefreshToken=true \
-s refreshTokenMaxReuse=0
(The example above: AT 10 min, SSO idle 14 days, SSO max 90 days, offline idle 30 days, offline max 180 days, rotation enabled.)
Per-Device Session Management
A single user logs in from multiple devices concurrently. Each login should be an independent SSO session (and therefore an independent token family).
Session list for user alice (IdP view)
┌────────────┬───────────────┬─────────────┬──────────────┐
│ Session ID │ Device │ Started │ Last activity│
├────────────┼───────────────┼─────────────┼──────────────┤
│ sess-a1 │ MacBook/Chrome│ 06-10 09:12 │ 06-12 11:40 │
│ sess-b2 │ iPhone/app │ 06-08 20:01 │ 06-12 08:15 │
│ sess-c3 │ Work PC/Edge │ 06-11 08:55 │ 06-11 18:02 │
└────────────┴───────────────┴─────────────┴──────────────┘
The operational capabilities this structure provides:
1. **Session listing**: a "devices logged in to my account" screen. Let users terminate suspicious sessions themselves.
2. **Individual invalidation**: terminate only the lost phone's session. Other devices are unaffected.
3. **The unit of anomaly detection**: if RTs from the same family are used from IPs in different countries, you can quarantine just that session.
Sessions can be inspected and terminated via the Keycloak Admin REST API.
List a user's sessions
kcadm.sh get users/USER-UUID/sessions -r myrealm
Terminate a specific session only
kcadm.sh delete sessions/SESSION-ID -r myrealm
Terminate all of a user's sessions (logout from all devices)
kcadm.sh create users/USER-UUID/logout -r myrealm
Logout and Session Cleanup
Logout is not "deleting a cookie" — it is a **distributed transaction that consistently cleans up the three session layers and the token family**. A complete logout must perform all of the following.
1. Delete the app session (the app's own session cookie/store)
2. End the IdP session (OIDC RP-Initiated Logout: /logout endpoint)
3. Revoke the RT (token revocation: RFC 7009 /revoke)
4. Propagate to other apps (Back-Channel Logout per the OIDC spec)
There are two propagation mechanisms.
| Mechanism | Operation | Trade-offs |
| --- | --- | --- |
| Front-Channel Logout | The browser visits each app's logout URL via iframes | Simple to implement / browser-dependent, breaks under third-party cookie blocking |
| Back-Channel Logout | The IdP POSTs a logout token (JWT) directly to each app server | Highly reliable, recommended in 2026 / the app must maintain a session-sid mapping |
The receiving app's responsibilities in Back-Channel Logout matter. After verifying the logout token's signature and claims (iss, aud, events, sid), the app must **actually destroy the app session** corresponding to the sid. If you did not store this mapping (IdP sid → app session ID) at login time, there is nothing you can do when the propagation arrives.
Back-Channel Logout flow
IdP ── POST logout_token(JWT, sid=xyz) ──> App B's /backchannel-logout
├ verify signature/claims
├ sid=xyz → find app session s-42
└ destroy s-42, respond 204
The BFF (Backend-for-Frontend) Pattern
A pattern that structurally eliminates the SPA token management problem. Tokens never reside in the browser; a dedicated frontend backend becomes the custodian of tokens and the proxy.
┌─────────┐ HttpOnly session cookie ┌─────────────┐ holds AT/RT ┌────────┐
│ Browser │ <───────────────────────> │ BFF │ <───────────> │ IdP │
│ (SPA) │ proxies /api/* │ (server │ └────────┘
└─────────┘ │ sessions) │ attaches AT
│ └──────┼──────────────> APIs
└─────────────┘
How it works:
1. Login is performed by the BFF as a confidential client running the code + PKCE flow.
2. AT/RT exist only in the BFF's server-side session store (Redis, etc.).
3. The browser talks to the BFF with an HttpOnly + Secure + SameSite=Strict cookie.
4. SPA API calls are proxied by the BFF, which attaches the AT. When the AT expires, the BFF silently refreshes with the RT.
| Aspect | Tokens in the browser | BFF |
| --- | --- | --- |
| Token exfiltration via XSS | Possible | Impossible (no tokens in the browser) |
| CSRF | Not applicable | Needs SameSite + CSRF token defenses |
| Where rotation is implemented | The SPA (cross-tab contention) | The BFF (single point, simple) |
| Operational cost | Low | Extra BFF infrastructure |
The chronic SPA problem of multiple tabs simultaneously refreshing and colliding with rotation (tab A refreshes so the RT changes, tab B refreshes with the old RT → false reuse detection) naturally disappears when refreshes are unified in the BFF. For new projects, consider the BFF the default.
The core of the BFF's token refresh can be sketched in Node/Express as follows. The key point is serializing refreshes with a per-session lock.
// Token refresh middleware in the BFF (conceptual sketch)
const sessionLocks = new Map(); // serialize refreshes per session
async function ensureFreshToken(req, res, next) {
const session = await store.get(req.sessionId);
if (!session) return res.status(401).end();
const skewMs = 30_000; // refresh proactively 30s before expiry
if (session.atExpiresAt - Date.now() > skewMs) {
req.accessToken = session.accessToken;
return next();
}
// Serialize concurrent refreshes of the same session into one
// (prevents false rotation triggers)
let lock = sessionLocks.get(req.sessionId);
if (!lock) {
lock = oidcClient
.refresh(session.refreshToken) // rotation: a new RT comes back
.then(async (tokens) => {
await store.update(req.sessionId, {
accessToken: tokens.access_token,
refreshToken: tokens.refresh_token, // discard the old RT now
atExpiresAt: Date.now() + tokens.expires_in * 1000,
});
return tokens.access_token;
})
.finally(() => sessionLocks.delete(req.sessionId));
sessionLocks.set(req.sessionId, lock);
}
try {
req.accessToken = await lock;
next();
} catch (err) {
await store.destroy(req.sessionId); // refresh failure = end session
res.status(401).end();
}
}
Note that when scaling the BFF horizontally across multiple instances, the in-memory lock above must be replaced by a Redis distributed lock (or a single refresh-dedicated worker) to guarantee the same serialization.
Security Incident Scenarios and Response
Scenario 1: suspected mass RT leak via infostealer
Signal: reuse detection events spike to tens of times the baseline
Response:
1. Confirm that detected families are being auto-invalidated
(the defense is already operating)
2. Extract affected users → force-terminate all sessions +
guide password/passkey re-enrollment
3. Separately review users holding offline tokens
(they survive logout)
4. Consider temporarily shortening AT lifetime (10 min → 5 min)
Scenario 2: a specific user account reported compromised
1. Immediately terminate all sessions (logout from all devices)
kcadm.sh create users/USER-UUID/logout -r myrealm
2. Revoke consents, including offline tokens
kcadm.sh delete users/USER-UUID/consents/CLIENT-ID -r myrealm
3. Require credential reset (password + passkey re-enrollment)
kcadm.sh update users/USER-UUID -r myrealm \
-s 'requiredActions=["UPDATE_PASSWORD","webauthn-register-passkey"]'
Since ATs cannot be revoked (stateless), either accept the residual risk for the AT's lifetime, or require introspection-based session liveness checks for high-risk APIs only.
Scenario 3: suspected IdP signing key leak
This is the moment token lifecycle design shines. Rotating the key (removing the old key immediately) invalidates every AT signed with it at once. If AT lifetime is 10 minutes, the "window in which the leaked key can forge tokens" also closes the moment the key is removed. RTs are verified against server-side state, so respond with session invalidation.
Monitoring metrics
| Metric | Meaning | Example alert threshold |
| --- | --- | --- |
| RT reuse detections | Theft attempts or client bugs | More than N per hour |
| Same family used from multi-country IPs | Session hijacking | Even a single case |
| invalid_grant ratio | Mixed signal of expiry/misuse/attack | 3x baseline |
| Offline token issuance count | Growth of long-lived credentials | Watch weekly trend |
| Average session lifetime | Policy effectiveness check | Distribution shift |
A Collection of Anti-Patterns
| Anti-pattern | Problem | Correction |
| --- | --- | --- |
| 24-hour AT lifetime | 24-hour damage window for an unrevokable token | 5-15 min + RT refresh |
| 90-day RT without rotation | No means of theft detection at all | Rotation + reuse detection |
| RT stored in localStorage | One XSS leaks a long-lived credential | HttpOnly cookie or BFF |
| Logout deletes only the cookie | IdP session/RT survive → silent re-login | RP-Initiated Logout + revoke |
| offline_access allowed for all clients | Proliferation of logout-proof tokens | Restrict the scope to clients that need it |
| Multi-instance IdP without a shared session store | Reuse detection works per instance | Shared store / cluster cache |
| Each tab refreshes its own RT | False rotation invalidations | Unify refreshes (lock or BFF) |
Conclusion
A summary of theft-resistant token lifecycle design:
- **Keep ATs short and assume they cannot be revoked**: 5-15 minutes. Control the damage window via lifetime.
- **Manage RTs statefully and assume they will be stolen**: rotation + reuse detection + family invalidation. No need to tell the thief from the owner — cut everything the moment a collision appears.
- **Think of sessions in three layers**: the app session, IdP session (cookie), and SSO session (server state) expire independently. Remember that in Keycloak, RT lifetime is a derivative of session lifetimes.
- **Logout is a distributed transaction**: RP-Initiated Logout + token revocation + Back-Channel Logout propagation form one set.
- **For SPAs, make the BFF the default**: removing tokens from the browser eliminates rotation concurrency issues and XSS exfiltration at the same time.
- **Rehearse incident response in advance**: turn session force-termination, consent revocation, and key rotation commands into a runbook — your 3 a.m. response speed will be different.
The true skill of authentication shows not on the login screen, but in the invisible lifecycle where tokens expire, rotate, and are revoked. Start by opening the realm settings of the system you operate today and asking "why?" of every timeout value.
References
- [OAuth 2.1 draft (draft-ietf-oauth-v2-1)](https://datatracker.ietf.org/doc/draft-ietf-oauth-v2-1/)
- [RFC 9700 - Best Current Practice for OAuth 2.0 Security](https://datatracker.ietf.org/doc/html/rfc9700)
- [RFC 6749 - The OAuth 2.0 Authorization Framework](https://datatracker.ietf.org/doc/html/rfc6749)
- [RFC 7009 - OAuth 2.0 Token Revocation](https://datatracker.ietf.org/doc/html/rfc7009)
- [RFC 7662 - OAuth 2.0 Token Introspection](https://datatracker.ietf.org/doc/html/rfc7662)
- [RFC 9449 - OAuth 2.0 Demonstrating Proof of Possession (DPoP)](https://datatracker.ietf.org/doc/html/rfc9449)
- [OpenID Connect Core 1.0](https://openid.net/specs/openid-connect-core-1_0.html)
- [OpenID Connect RP-Initiated Logout 1.0](https://openid.net/specs/openid-connect-rpinitiated-1_0.html)
- [OpenID Connect Back-Channel Logout 1.0](https://openid.net/specs/openid-connect-backchannel-1_0.html)
- [OpenID Connect Session Management 1.0](https://openid.net/specs/openid-connect-session-1_0.html)
- [Keycloak Documentation](https://www.keycloak.org/documentation)
- [Keycloak Server Administration - Managing user sessions](https://www.keycloak.org/docs/latest/server_admin/index.html)
- [Keycloak 26.6.0 Release Notes](https://www.keycloak.org/2026/04/keycloak-2660-released)
현재 단락 (1/257)
The hardest question in an authentication system is not "how do we log users in" but "**how long, an...