- Introduction — The Payment Whose Response Never Arrived
- What Idempotency Is
- Safe Methods and Idempotent Methods
- The Idempotency Key for POST
- The "Exactly-Once" Myth
- Retrying Properly — Exponential Backoff
- The Thundering Herd — Why You Need Jitter
- What to Do on the Server Side — Designing to Survive Retries
- Common Pitfalls
- Conclusion
- References
Introduction — The Payment Whose Response Never Arrived
Let us start with a single scene. A user taps the pay button. Your server receives the request, charges the card, creates the order, and sends a response back. But right before that response reaches the user, the network drops. The user's screen shows "request failed." The user, naturally, taps the button again.
Now a dangerous question appears. Will the payment go through twice?
That question is what this entire post is about. In distributed systems the network is unreliable, on top of an unreliable network we must retry, and retrying inevitably creates the situation where something was "already processed but its response got lost." The core concept that keeps a system correct even in that situation is idempotency. This post starts from what idempotency is and works through safe versus unsafe methods, the idempotency key that makes POST safe, the common "exactly-once" misconception, and how to retry properly.
What Idempotency Is
Idempotency is a word borrowed from mathematics. If applying an operation multiple times produces the same result as applying it once, that operation is idempotent. The absolute-value function is an easy example: taking the absolute value of a number once or ten times gives the same result.
In the API context, idempotency translates like this. If sending the same request once or many times leaves the server in the same final state, that request is idempotent. The key point is not "the response must be identical every time" but that the side effect must happen only once.
Some examples:
- "Set user 42's email to
a@b.com" is idempotent. No matter how many times you send it, the result is a single state where the email isa@b.com. - "Increment user 42's balance by 100" is not idempotent. Send it twice and the balance goes up by 200.
This distinction is the heart of the payment example. "Make a payment" is essentially closer to an increment, so without any mechanism it is not idempotent. A retry turns straight into a double charge. Our goal is to make this non-idempotent operation idempotent.
Safe Methods and Idempotent Methods
HTTP already defines these ideas at the method level. You have to distinguish two properties.
Safe methods do not change server state — they are read-only. GET, HEAD, and OPTIONS belong here. Because safe methods never alter data no matter how many times you call them, you can retry them freely.
Idempotent methods leave the server in the same state whether called once or many times. Safe methods are naturally idempotent, but being idempotent does not make a method safe.
The HTTP specification's properties for each method:
| Method | Safe | Idempotent | Typical meaning |
|---|---|---|---|
| GET | Yes | Yes | Read a resource |
| HEAD | Yes | Yes | Read headers only |
| OPTIONS | Yes | Yes | Query communication options |
| PUT | No | Yes | Replace a resource wholesale (set) |
| DELETE | No | Yes | Delete a resource |
| POST | No | No | Create a new resource, etc. |
| PATCH | No | Sometimes | Partial modification |
It is worth noting why PUT and DELETE are idempotent. PUT is a set operation: "make this resource equal this value." Send the same PUT many times and the resource stays at that one value. DELETE is the same: telling the server to delete something already gone still leaves the final state as "absent" (the response code may differ, say 404, but the state is the same — idempotency is about state, not response codes).
The trouble is always POST. POST often means "create a new thing," and that is inherently non-idempotent. Every retry creates a new order, a new payment, a new comment. If you get confused about what each HTTP status code means, you can review them in this site's HTTP status codes tool.
The Idempotency Key for POST
The standard way to make POST idempotent is the idempotency key. The idea is simple. The client generates a unique key (usually a UUID) per request and sends it in a header. The server remembers this key, and if a request arrives again with the same key, it does not process it anew but returns the stored first response as-is.
First request
Client --(Idempotency-Key: abc-123, pay request)--> Server
|
key never seen
-> actually charge
-> store result under abc-123
Client <-----------(200 OK, order #500)------------- Server
(response lost, client retries)
Second request (same key)
Client --(Idempotency-Key: abc-123, pay request)--> Server
|
key already seen
-> do NOT charge again
-> return stored result
Client <-----------(200 OK, order #500)------------- Server
The key point is that on the second request the actual charge does not happen again. The user tapped twice, but was billed once. Payment APIs like Stripe and PayPal use exactly this mechanism.
A few details matter when implementing it:
- Storing and expiring keys. You must store the key-to-response mapping somewhere (say Redis or a DB). You cannot keep it forever, so usually you set an expiry such as 24 hours.
- Handling concurrency. A problem arises when two requests with the same key arrive almost simultaneously. If the first is still processing when the second arrives, both may mistake it for a "never seen" key and process twice. So you must take a lock the moment you receive the key, or use a database unique constraint to atomically mark "this key is already in progress."
- Validating that the key matches the request body. If the key is the same but the request body differs, that is a client bug or an attack. The server is safer to detect and reject it.
- Storing responses carefully. Beyond successful responses, treat the nature of failures carefully. A transient failure (say a DB timeout) should genuinely be retried, while a definitive failure (say insufficient funds) is better cached so the same answer is returned.
The "Exactly-Once" Myth
Here we must confront the most common misconception in distributed systems head-on. Many people aim for "exactly-once delivery," or believe some system provides it. To put the conclusion first: across a network, exactly-once in its pure sense is close to impossible.
Why? The sender has only two strategies:
- At-most-once: Send and forget. Do not wait for an acknowledgment. It may be lost, but there are no duplicates.
- At-least-once: Retry until you get an acknowledgment. Nothing is lost, but duplicates can occur.
The root of the problem is the same as that payment scene. When the sender "does not receive an acknowledgment," it has no way to tell whether "the message never arrived" or "the message arrived but only the acknowledgment was lost." So to avoid loss you must retry (at-least-once), and retrying produces duplicates.
Then what do the systems we casually call "exactly-once" actually do? The answer is this:
Exactly-once = at-least-once delivery + deduplication on the receiver
That is, delivery itself is still "at-least-once." The receiver identifies messages it has already processed and ignores them from the second time on. What makes this deduplication possible is exactly the idempotency we saw earlier. Attach a unique ID to each message, record processed IDs, and even if the same message comes again the result does not change.
The key lesson is this. Do not wait for the infrastructure to magically guarantee "exactly-once" — make your consumers idempotent. Then no matter how many times delivery happens, the result equals processing once. This principle applies not only to APIs but to message queues in general. If you want to see how queue systems handle this "at-least-once" behavior and duplicates, you can visualize each one's delivery guarantees in this site's Message Queue Playground.
Retrying Properly — Exponential Backoff
Now we move to doing retries well. "If it fails, send it again" sounds simple, but a naive implementation can bring the system down instead.
The worst approach is to retry immediately, at a fixed interval, forever. When a server slows briefly under load, if every client detects the failure and immediately hammers it again and again, the barely-surviving server collapses completely. The retry does not cure the outage; it worsens it.
The first improvement is exponential backoff. Double the retry interval each time: 1s, 2s, 4s, 8s. This way, as failures continue, the retry pressure drops exponentially, giving the struggling server room to breathe.
attempt 1 fails --> wait 1s
attempt 2 fails --> wait 2s
attempt 3 fails --> wait 4s
attempt 4 fails --> wait 8s
...stop at a cap (say 30s), give up when max retries reached
You must add two things here:
- A cap: Set a maximum wait so the interval does not grow unbounded (say 30s).
- A max retry count and giving up: Do not retry forever. After a set number of attempts, give up and send the failure to a dead-letter queue or notify the user.
And do not retry just any failure. Retrying is meaningful only for transient errors. Network timeouts, 503 Service Unavailable, and 429 Too Many Requests may succeed on retry. In contrast, definitive errors like 400 Bad Request, 401 Unauthorized, and 404 Not Found return the same result no matter how many times you resend, so retrying is a waste.
The Thundering Herd — Why You Need Jitter
Exponential backoff alone is not enough. A subtle but deadly problem remains: the thundering herd.
Picture the situation. A server goes down briefly. At that moment 10,000 clients simultaneously detect the failure. They all follow the same exponential backoff rule: retry after 1s, then after 2s, then after 4s. The problem is that they all retry at exactly the same instant. Just as the server is about to recover, 10,000 requests pour in at once and the server falls over again. And this wave repeats identically at 2s, 4s, and 8s. Synchronized retries beat the server periodically.
The solution is jitter — adding randomness. If each client mixes a bit of random value into the computed wait, the retry moments spread evenly across the time axis. Instead of 10,000 requests piling onto one point, they scatter widely, giving the server room to recover.
No jitter: everyone retries at the same instant
||||||||| |||||||||
---+---------+---------+------ (server keeps getting hit by waves)
With jitter: retries spread out
| | || | | || | | |
---+---------+---------+------ (load spreads evenly)
There are a few ways to add jitter. The most widely recommended is full jitter, which waits "a random time between 0 and the computed backoff value." In pseudocode:
import random
def backoff_with_jitter(attempt, base=1.0, cap=30.0):
# grow exponentially but not past the cap
exp = min(cap, base * (2 ** attempt))
# random between 0 and exp (full jitter)
return random.uniform(0, exp)
# example: the wait differs for each failed attempt number
for attempt in range(5):
wait = backoff_with_jitter(attempt)
print(f"attempt {attempt}: wait {wait:.2f}s")
The effect of this simple randomization on the stability of large systems is surprisingly large. Ever since a famous AWS Architecture Blog post prescribed this "exponential backoff + jitter" combination as the standard remedy, it has become the default pattern of essentially every reliable client.
What to Do on the Server Side — Designing to Survive Retries
So far the view has been mostly from the client. To build a reliable API, the server too must be designed to survive retries.
Make every write endpoint idempotent. Consider applying the idempotency key we saw earlier not just to payments but to important state-changing POSTs in general. Then clients can retry with confidence.
Signal when to retry with Retry-After. When the server is overloaded or throttling requests (429), it can tell the client "come back at this time" with the Retry-After header. A well-behaved client respects this signal and refrains from unnecessary early retries.
Rate limiting and load shedding. For the server to protect itself even when a retry storm arrives, it is better to accept only as much as it can handle and quickly reject the rest with 429. Rejecting some requests fast and keeping the rest alive is better for overall availability than holding every request and processing slowly until all die together.
Circuit breakers. If a downstream service you depend on keeps failing, instead of hammering it on every request and worsening things, switch to a "broken" state for a while and return failures fast. After some time, retry cautiously to check whether it recovered. This is a device that prevents a failure from spreading across the whole system.
Common Pitfalls
Finally, a compact list of things that trip people up in practice:
- Retrying a non-idempotent operation with no protection — the classic cause of double payments and duplicate orders. Wrap it in an idempotency key.
- Retrying every error — definitive errors like 400, 401, 404 are useless to retry and only waste resources. Distinguish which errors to retry.
- Fixed backoff without jitter — invites the thundering herd. Always add randomness.
- Infinite retries — without a cap and a max count, failed requests pile up in the system forever. Set a give-up point and a dead-letter queue.
- Ignoring concurrency in idempotency keys — if you do not handle same-key requests arriving nearly simultaneously atomically, double processing slips through.
- Blind faith in "exactly-once" — do not expect the infrastructure to guarantee it; make your consumers idempotent and guarantee it yourself.
Conclusion
The secret to a reliable API is not "never failing." The network will eventually fail, and when it does we will retry. The real secret is making the correct result come out even when retries happen. Idempotency is at the center of that. Make write operations idempotent, and that troublesome "already processed but its response was lost" situation is no longer a disaster but just one more harmless request that happened to arrive again.
Add to that the discipline of taming the rhythm of retries with exponential backoff and jitter, and surviving storms on the server side with rate limiting and circuit breakers, and your system will stand steady even on an unstable network. Choosing the solid reality of "at-least-once plus idempotent processing" over chasing the myth of "exactly-once" is the true foundation of a reliable API.
References
- MDN, HTTP idempotent methods: https://developer.mozilla.org/en-US/docs/Glossary/Idempotent
- RFC 9110 (HTTP Semantics) — safe and idempotent methods: https://www.rfc-editor.org/rfc/rfc9110#name-method-properties
- AWS Architecture Blog, "Exponential Backoff and Jitter": https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
- Stripe API, Idempotent Requests: https://docs.stripe.com/api/idempotent_requests
- Google Cloud, "Retry strategy" (exponential backoff): https://cloud.google.com/storage/docs/retry-strategy
- This site's Message Queue Playground: /tools/message-queue-playground
- This site's HTTP status codes tool: /tools/http-status-codes
현재 단락 (1/104)
Let us start with a single scene. A user taps the pay button. Your server receives the request, char...