Skip to content
Published on

Cache Invalidation, Explained

Authors

Introduction — Why This Became a Joke

Computer science has a famous joke, attributed to Phil Karlton:

"There are only two hard things in computer science: cache invalidation and naming things."

The reason this joke has endured is that cache invalidation really is hard. Building a cache is itself easy: store a value somewhere and reuse it next time. The real difficulty lies in knowing exactly when that stored value has gone stale, and therefore when to throw it away. If the source data changed but the cache is still holding the old value, users see wrong information. But if you check the source every time, there was no point in having a cache.

This post confronts that difficulty head-on. We will cover what a cache is for, what write strategies exist, the pitfall called the cache stampede and how to mitigate it, how invalidation is actually implemented, and the layers of cache that run from CPU to browser.

What a Cache Does and the Fundamental Problem

A cache's purpose is simple: store the result of an expensive computation or a slow lookup and reuse it to save time. Database queries, remote API calls, heavy rendering — keep the results of these close by and next time you can answer instantly.

A cache works well thanks to two properties.

  • Temporal locality: Data used just now is likely to be used again soon.
  • Spatial locality: If some data is used, data near it is likely to be used soon too.

The problem comes from the fact that a cache is a copy of the source. A copy goes stale the moment the source changes. Handling this staleness is the root of every difficulty in caching. Here we must distinguish two bad situations.

  • Stale data: The cache holds a value older than the source. The user sees a wrong value.
  • Cache miss: The value is not in the cache, so you must go all the way to the source. It is slow.

A caching strategy is ultimately a tightrope walk between these two. Discard often to reduce staleness and misses rise; keep things long to reduce misses and staleness rises. There is no "correct answer," only a balance point that fits the nature of the data.

TTL — The Simplest Invalidation

The most widely used invalidation method is TTL (Time To Live). You attach a lifetime to each cache entry: "this value is valid for only N seconds." Once that time passes, the entry expires and the next request fetches fresh from the source.

TTL's appeal is simplicity and predictability. You do not have to write separate invalidation logic — things just get discarded when time passes. Even in the worst case, staleness never exceeds the TTL.

But TTL is fundamentally a trade-off between staleness and load.

  • Long TTL: High hit rate, so fast and light on the source. But when the source changes, you show the old value for up to the TTL.
  • Short TTL: Good freshness. But frequent expiry means more source lookups and slower responses.

So TTL fits data that is "fine to be a little stale" — news lists, trending posts, exchange rates, things where a few seconds to a few minutes of delay is acceptable. Conversely, data that "must never be stale" (say an account balance) is not served well by TTL alone and needs the explicit invalidation we will see later.

Write Strategies — Write-Through, Write-Behind, Cache-Aside

Depending on how you update the cache and the source together, there are a few classic patterns. Distinguishing them makes design much clearer.

Cache-aside (lazy loading). The most common pattern. The application manages the cache directly.

  Read:
    1. Look at the cache
    2. If present (hit), return that value
    3. If absent (miss), read from source, put in cache, return

  Write:
    1. Update the source
    2. Invalidate (delete) the corresponding cache entry

The key is that the cache fills "only when needed." Data that is not read never enters the cache. On writes you usually do not update the cache but only delete it, letting the next read repopulate with the fresh value. This approach is simple and robust, but there is the latency of a source round-trip on every miss.

Write-through. A write updates the cache and the source at the same time. The application writes to the cache, and the cache immediately writes to the source too.

  Write:
    application --> write to cache --> (immediately) write to source too
    both must succeed to complete

The advantage is that the cache is always fresh. Reads are always served straight from the cache and are fast. The downside is that writes are slower, since every write must wait for the source write. It can also waste space by filling the cache with data that will not be read for a while.

Write-behind (write-back). A write is first recorded only to the cache, and reflecting it to the source happens later, asynchronously.

  Write:
    application --> write to cache (immediate completion)
                       |
                       v (later, in batches)
                    reflect to source

The advantage is that writes are very fast, and you can batch many writes into one source update to reduce load. The downside is dangerous: if the cache dies in the interval where it has the write but the source does not yet, data is lost. So write-behind is used only when you can tolerate loss, or when you have a separate durability mechanism.

Comparing the three at a glance:

StrategyWrite speedRead freshnessLoss riskTypical situation
cache-asideModerateFresh after missLowGeneral, read-heavy
write-throughSlowAlways freshLowFreshness matters
write-behindVery fastAlways fresh (per cache)HighWrite floods, loss tolerable

The Cache Stampede — The Dogpile Problem

Now we address the most notorious pitfall in caching: the cache stampede, also called the dogpile problem or a cache rush.

The situation goes like this. Imagine the moment a popular cache entry expires. That entry was being read thousands of times per second. At the very instant it expires, thousands of requests simultaneously suffer a cache miss. And they all rush at once to the source (say a database) to recompute the same value. A source lookup that one request could have satisfied explodes into thousands, and the source is crushed under the load. In the worst case the source dies, which causes more misses, and the whole system collapses in a chain reaction.

  Normal: requests served from cache
    req req req --> [cache hit] --> fast response

  Stampede: the moment a hot key expires
    req req req req ... (thousands)
         all miss
           |
           v (rush simultaneously)
        [source DB] <-- crushed by thousands of identical recomputations

The key is the paradox that "the more popular an entry, the more dangerous." The more frequently a value is read, the larger the burst of simultaneous misses at expiry. There are a few ways to mitigate this.

1. Lock / mutex. On a miss, take a lock so that only the first request queries the source. The rest wait briefly until that one fills the value, or receive the old value for a moment. This reduces source lookups to one. The downside is added waiting and somewhat more complex implementation.

2. Add jitter to the TTL. If many entries expire at exactly the same instant, the stampede grows. So add a small random value to the TTL to scatter expiry moments. For example, instead of setting the TTL to exactly 300 seconds, randomize it between 270 and 330; expiry spreads across the time axis and simultaneous misses drop.

import random

def ttl_with_jitter(base=300, spread=30):
    # scatter expiry moments to avoid simultaneous expiry
    return base + random.randint(-spread, spread)

3. Stale-while-revalidate. This one is especially elegant. Even when an entry expires, do not throw away the old value immediately; "return the old value right away while quietly fetching the new value in the background." The user gets a (possibly slightly stale) value instantly without waiting, and the refresh happens once in the background. The HTTP cache-control directive stale-while-revalidate is exactly this idea standardized.

  entry expired
      |
      v
  request arrives --> return old value immediately (user does not wait)
      |
      +--> fetch new value in background, refresh cache (once)

4. Early recomputation. There is also a technique of probabilistically recomputing the value shortly before it expires. The closer to expiry, the higher the recompute probability, so that a fresh value is already ready at the actual expiry moment. This is called probabilistic early expiration.

In practice you combine these. For example, scatter expiry with jitter, remove waiting with stale-while-revalidate, and collapse the refresh into one with a lock.

Invalidation — Explicitly Throwing Away the Old Value

TTL is passive invalidation: "discard when time passes." But some data must have its cache discarded the moment the source changes. This is explicit invalidation, and this is the genuinely hard part. There are two representative strategies.

Event-based invalidation. When the source data changes, emit an event to immediately delete (or update) the related cache entries. For example, when a "product 42's price changed" event fires, invalidate every cache entry that contains product 42.

The strength of this approach is freshness. When the source changes, the cache is cleaned up almost immediately. The difficulty is dependency tracking. "Product 42" may live in the product-detail cache, the category-list cache, and the search-results cache. You must know exactly which cache entries depend on this data to invalidate them all. Mismanage this dependency graph and some caches quietly stay stale, becoming a bug.

Versioned keys / key versioning. A very practical and robust technique. Embed a version in the cache key itself, and handle invalidation as "move to a new key" instead of "delete."

  Include a version in the cache key:
    user:42:v7:profile

  When user 42's data changes --> bump the version to v8
    now the code queries user:42:v8:profile
    -> nobody looks up the v7 key, so it is naturally discarded (cleaned up by TTL)

The beauty of this approach is that you do not have to delete anything directly. Just bump the version number, and old-version keys stop being queried and are eventually cleaned up by TTL. It is also robust against race conditions (where invalidation and re-fetch cross paths), because the old and new keys are physically different. A common application is to keep a "version counter" on some entity or the whole dataset and append it to the cache key.

The two strategies are not exclusive. Use event-based where you need fine-grained immediate invalidation, and versioned keys where you need to sweep a broad range all at once — often together.

Why Invalidation Is Truly Hard

Let us pause and consolidate why all of this is so hard. Cache invalidation is hard not merely because the code is complex, but because of a few fundamental tensions.

First, the problem of distribution. Caches are usually scattered across many servers and many layers. Even after you invalidate in one place, other caches may still hold the old value. Invalidating all caches simultaneously and atomically inherits the hard problems of distributed systems (consensus, ordering) directly.

Second, race conditions. If "the moment you invalidate the cache" and "the moment another request re-reads the old value and puts it in the cache" cross, the old value you just deleted is resurrected in the cache. The invalidation order and the re-fetch order tangle subtly to create such bugs.

Third, the complexity of dependencies. As we saw, when a single piece of source data is scattered across many cache entries, it is hard to fully grasp what must be invalidated. One missed dependency becomes a stale bug.

Because these three overlap, it is hard enough to become joke material alongside "naming things." Perfect invalidation is often impossible, so practice moves toward a pragmatic compromise: decide "how long can we tolerate staleness" and mix TTL with explicit invalidation.

The Layers of Cache — From CPU to Browser

Caches do not live in just one place. The entire computing stack is made of layers of cache. As a single data request travels from the browser deep into the server, it passes through several cache layers. Seeing this whole picture matters.

  Closest and fastest (small, short-lived)
  ┌─────────────────────────────┐
  │ CPU cache (L1/L2/L3)         │  nanoseconds, managed by hardware
  ├─────────────────────────────┤
  │ OS page cache / memory       │  keeps disk reads in memory
  ├─────────────────────────────┤
  │ Application in-memory cache  │  local cache inside the process
  ├─────────────────────────────┤
  │ Distributed cache (Redis)    │  shared across servers (Memcached too)
  ├─────────────────────────────┤
  │ Database cache               │  query / buffer-pool cache
  ├─────────────────────────────┤
  │ CDN edge cache               │  content near the user
  ├─────────────────────────────┤
  │ Browser cache                │  stored on the user's device
  └─────────────────────────────┘
  Farthest, but closest to the user

The character of each layer:

  • CPU cache (L1/L2/L3): The fastest cache, managed automatically by hardware. Invalidation is handled by the hardware's cache-coherence protocol. We do not touch it directly, but it is why code that respects data locality is fast.
  • OS page cache: The operating system keeps blocks read from disk in memory, so reading the same file a second time is much faster.
  • Application in-memory cache: A process-local cache (say a local hashmap or LRU cache). The fastest, but separate per server, so consistency is hard.
  • Distributed cache (Redis, Memcached): A cache layer shared across servers. The strategies discussed above mostly play out here.
  • Database cache: The DB's own buffer pool and query cache.
  • CDN edge cache: Keeps static assets (and increasingly dynamic content) at edges near the user. Invalidation here (a purge) must propagate to edges worldwide, which is a hard problem in itself.
  • Browser cache: Stored on the user's device, controlled by HTTP headers like Cache-Control and ETag.

This layered structure makes cache invalidation even harder. Change one source, and copies of that value may be scattered across all these layers. The answer to "why am I still seeing old data?" is usually "some layer's cache has not refreshed yet." So diagnosis must always narrow down "which layer is producing the staleness."

HTTP Caching — The Web's Standard Invalidation

Caches at the web layer are precisely controlled by HTTP headers. This is the standard language for governing browser and CDN caches.

  • Cache-Control: The core directive for cache behavior. It specifies max-age (cache for N seconds), no-cache (cache but revalidate on use), no-store (do not cache at all), and private/public (who may cache it).
  • ETag and conditional requests: Attach an ETag, a fingerprint of the content, to the response. On the next request the browser sends this value as If-None-Match, and when the content has not changed the server returns just 304 Not Modified with no body, saving a lot of transfer.
  • stale-while-revalidate: The idea we saw earlier. It allows serving an expired response immediately while revalidating in the background.

The power of HTTP caching is in the combination of these directives. For example, static assets that rarely change (JS, CSS) pair a very long max-age with a filename that embeds a content hash. Then, when the file's contents change, the filename (and thus URL) changes — which is essentially applying the versioned-key strategy to the web. The old URL stops being requested and is naturally invalidated.

Practical Guidelines

Compressing all of the above from a practical standpoint:

First, decide how stale this data is allowed to be. This is the starting point of every decision. If a few minutes of staleness is fine, TTL is enough. If it must never be stale, you need explicit invalidation. Design a cache without answering this question and it will inevitably come back later as a stale bug or overload.

Next, prepare for stampedes on popular entries. The more frequently a key is read, the more dangerous its expiry moment. Add jitter to the TTL, remove waiting with stale-while-revalidate, and if needed collapse the refresh into one with a lock.

If you need invalidation, consider versioned keys first. They are more robust than direct deletion and resistant to race conditions. Add event-based invalidation only where fine-grained immediate invalidation is truly required.

And do not forget that caches live in many layers — browser, CDN, application, distributed cache, DB. When designing invalidation, be clear about which layers it must propagate to, and when diagnosing, narrow down which layer is producing staleness.

Finally, build observability. If you do not measure cache hit rate, miss rate, and stale ratio, you cannot tell whether the cache is running well or quietly accumulating problems.

Conclusion

There is a reason cache invalidation became joke material alongside naming things. Building a cache is easy, but knowing exactly when the copy has gone stale touches three fundamental difficulties at once: distribution, race conditions, and dependencies. Perfect invalidation is often out of reach.

So the real-world solution is not magic but a pragmatic compromise. Decide "how stale can this data be," and mix TTL with explicit invalidation to match. Stop stampedes on hot entries with jitter and stale-while-revalidate, use versioned keys for broad invalidation and events for fine-grained invalidation. And always stay aware that all of this plays out on top of a stack of caches running from CPU to browser.

A cache is the most powerful lever for performance and the most subtle source of bugs at the same time. When you understand both sides together, a cache stops being a headache and becomes a tool you can trust.

References