Skip to content
Published on

Payment Idempotency: Preventing Double Charges

Authors

Introduction — The Nightmare of Being Charged Twice

The most common and most frightening bug in a payment system is "the user gets charged twice." If the price of a coffee is deducted twice, the user is annoyed; if a large amount is deducted twice, trust collapses. And this bug happens even without an obvious mistake in the code. The cause is usually the uncertainty of the network and the retries that respond to it.

One fundamental truth of distributed systems is that "the network will eventually fail." And when it fails, the reasonable response a client can take is to retry. But in payments, this reasonable retry can produce the unreasonable result of a double charge.

This post takes on that problem head-on. We will look at why a retry creates a double charge, what idempotency is that prevents it, how to design idempotency keys and dedup windows and unique constraints, how to combine at-least-once delivery with dedup, how to model a payment as a state machine, and how a real system like Stripe implements idempotency. If you want to experiment visually with retries, flow control, and at-least-once delivery, it helps to keep this site's Message Queue Playground open alongside.

What Is Idempotency

Idempotency is a concept from mathematics and computer science. If applying an operation multiple times produces the same result as applying it once, that operation is idempotent.

An everyday example gives you the feel. Whether you press an elevator's floor button once or five times, the result is the same: it goes to that floor. That is idempotent. By contrast, inserting a coin into a vending machine is not idempotent, because each insertion increases the balance.

It is even clearer through HTTP methods.

  • GET is idempotent. Reading the same resource any number of times does not change server state.
  • PUT and DELETE are designed to be idempotent. Writing the same value multiple times, or deleting the same resource multiple times, leaves the same final state.
  • POST is not idempotent by default. It usually creates a new resource or triggers an action, so calling it twice makes it happen twice.

The problem is that a payment is usually a POST. "Creating a payment" is essentially a non-idempotent operation with a side effect (money leaves). What we want to do is make this non-idempotent operation idempotent — that is, ensure that "no matter how many times the same payment request is sent, the payment happens exactly once."

The Retry-and-Timeout Problem

To understand why idempotency is needed, you must see precisely how a retry creates a double charge. The crux is the ambiguity of a timeout.

A client sends a payment request to the server and waits for a response. But a timeout occurs. At this point the only thing the client knows for sure is that "no response arrived within the allotted time." What actually happened could be several things.

  What might actually have happened when a timeout occurred:

  Case A: the request never reached the server    -> not charged.  Must retry.
  Case B: the server processed it but the response was lost  -> charged.  Must not retry.
  Case C: the server is still processing, not done yet  -> in progress.  Retrying risks a race.

Here lies the essence of the problem. The client cannot distinguish A, B, and C. The mere fact of "no response" gives no way to tell whether the payment happened.

At this point the client has two choices. If it does not retry, then it may abandon what was actually Case A and miss the payment (the user tried to pay but did not). If it does retry, then it resends what was actually Case B and creates a double charge.

So without idempotency, you cannot escape this dilemma. Idempotency unties the knot. By guaranteeing "it is safe to retry," the client can retry with confidence: Case A is handled normally, and Case B's duplicate is filtered out. In other words, idempotency is a mechanism that makes retries safe.

Idempotency Keys — Putting a Name Tag on a Request

The standard way to implement idempotency is the idempotency key. The idea is simple: the client attaches a unique identifier to each payment request, and the server uses that key to judge "have I seen this request before?"

  POST /v1/charges
  Idempotency-Key: 9f2c1a7e-...-b3   <- unique key generated by the client
  { "amount": 5000, "currency": "usd", "source": "tok_..." }

The server's processing rule is this.

  On receiving a request:
  1) Have I seen this idempotency key before?
     - No  -> actually process the payment, store (key -> result), then return the result
     - Yes -> return that stored result as-is (do not run the payment again)

Thanks to this simple rule, no matter how many times the client retries with the same key, the payment happens exactly once. Only the first request executes the actual payment; later retries receive a "copy" of the stored first result.

Here who creates the key, and how, matters.

  • The key is generated by the client. That way it can resend the same key on retry. If the server made the key, the client would not know which key to use on retry.
  • Each key should correspond to one logical operation. For example, assign one key to the single operation "pay for cart X," and use that same key for all retries of that operation. Using a sufficiently random value like a UUID removes any worry about accidental collision.
  • When the user presses pay again intending a new operation, a new key must be created. Reusing the same key makes the system treat it as a "retry" and block the new payment. Aligning this well with the UX is important.

Dedup Windows and Storage

To store an idempotency key, you must decide "until when" to store it. This is the dedup window. You cannot keep key-result records forever, so you usually retain them for a fixed period (for example, 24 hours).

The criterion for the window length is "over what time range do retries realistically happen?" Retries due to network errors usually happen within seconds or minutes. But to account for a client that was briefly offline and retries later, in practice many people set a generous window of about a day. Stripe retains idempotency keys for 24 hours.

The store itself should support fast lookups and expiration (TTL). Common choices are these.

  • A dedicated table in a relational DB: it can be handled within the same transaction as the payment's ledger data, which is good for consistency. It pairs well with the unique constraint we will see next.
  • An in-memory store like Redis + TTL: very fast, with automatic expiration. But since it is a separate store from the ledger DB, you must carefully handle consistency between the two (for example, the key was recorded but the payment transaction failed).

Whatever you use, it is important that the value you store is not a simple "I saw this key" flag but the entire result of the first request. You must return to the retrying client exactly the same thing as the original response (same payment ID, same status, same amount).

Unique Constraints — A Safety Net You Hand to the Database

If you implement idempotency keys only with "look up first, insert if absent," a subtle race condition remains. If two requests with the same key arrive almost simultaneously, both may judge "this key does not exist" and then both proceed with the payment. The gap between lookup and insert is the problem.

The most robust tool to fundamentally close this gap is the database's unique constraint. If you put a unique constraint on the idempotency key column, then even if two requests try to insert with the same key at the same time, the database lets exactly one succeed and rejects the rest. The database atomically guarantees "exactly one, concurrently."

CREATE TABLE payment_requests (
  idempotency_key TEXT PRIMARY KEY,   -- unique constraint: no duplicate key inserts
  status          TEXT NOT NULL,      -- 'in_progress' | 'succeeded' | 'failed'
  response_body   JSONB,              -- store the entire result of the first request
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

The flow that leverages this constraint looks roughly like this.

  1) Try to INSERT an 'in_progress' row with this key
     - Succeeds -> I am the "owner" of this operation. Actually process the payment and UPDATE the row with the result
     - Fails on unique violation -> means another request already claimed this key
         -> look up the existing row:
            - if already complete, return the stored result
            - if still in_progress, wait briefly and re-check, or return an "in progress" response

The key is to make the act of "claiming the slot first" itself an INSERT with a unique constraint. The look-up-then-insert gap disappears, so even among simultaneous retries exactly one executes the payment. If you handle this row in the same transaction as the ledger data, the payment record and the idempotency record always commit together, guaranteeing consistency. If you want to experiment with SQL constraints and transactions directly, this kind of schema and constraint can be tried in this site's SQL and Postgres playgrounds too.

Modeling a Payment as a State Machine

To handle idempotency properly, it helps to model a payment not as a simple two-value "success/failure" but as a state machine. A payment passes through several intermediate states, and how a retry or a callback should act differs by state.

A typical payment state flow looks like this.

  created ──▶ authorizing ──▶ authorized ──▶ capturing ──▶ captured
     │            │                              │
     │            ▼                              ▼
     │         failed                          failed
  canceled
                (separate flow) captured ──▶ refunding ──▶ refunded

Briefly, the meaning of each state.

  • created: the intent to pay is made but before authorization.
  • authorizing: the authorization request has been sent to the PSP/issuer (awaiting response).
  • authorized: authorization complete, hold placed. Before capture.
  • capturing / captured: capture in progress / complete.
  • failed / canceled / refunded: terminal states.

The reason a state machine helps decisively with idempotency is that it lets you clearly define the transitions allowed from each state. For example, if a capture request arrives again for a payment already in captured, that is a retry, so you do not capture again and instead return the current state as-is. If the same request arrives again while in authorizing, you can tell it is "still processing." In other words, the state machine becomes the basis for judging "is this request valid in the current state, or is it an already-handled retry?"

Storing state explicitly also makes crash recovery easier. If a payment is stuck in authorizing, the system can query the PSP for that transaction's actual status, confirm whether it is authorized or failed, and correct the state. Such a query is often called reconciliation or state synchronization.

Adding Dedup to At-Least-Once Delivery

Idempotency is central not only to the payment API but also to the asynchronous messages throughout a payment system. Payment events often flow through a message queue (for example, a "payment succeeded" event propagated to the settlement, notification, and accounting services), and most queues guarantee at-least-once delivery.

Let us look precisely at what at-least-once means. The queue guarantees a message is delivered "at least once," but not "exactly once." That is, the same message can be delivered more than once. Because if a failure occurs while the consumer, after processing a message, is telling the queue "done (ack)," the queue cannot know whether the message was processed and safely resends it. It is exactly the same structure as the payment API's timeout problem we saw earlier.

  Consumer receives a "payment succeeded" event
     -> processes it (e.g. credit balance, send receipt)
     -> failure while sending the ack
        -> the queue does not know whether it was processed -> redelivers
           -> risk of the same event being processed twice

Truly guaranteeing "exactly-once" at the delivery layer is very hard. The standard solution in practice is at-least-once delivery + dedup on the consumer side. That is, the queue guarantees only at-least-once delivery, and the consumer checks for itself whether "I already processed this message" and filters out duplicates. This is commonly called an idempotent consumer.

The implementation follows the same principle as the payment API's idempotency. Give each message a unique ID, and have the consumer store the IDs of processed messages. When a new message arrives, check whether its ID is already in the processed list, and if so, silently ignore it. Here too a unique constraint becomes a sturdy safety net. If inserting the "processed message ID" into a unique column fails, that failure is itself the signal that "it was already processed." If you want to observe the queue's at-least-once, ack, and redelivery behavior, you can work with it directly in the Message Queue Playground.

Stripe-Style Idempotency

Let us summarize how these pieces come together in a real product, using Stripe's idempotency design as an example. Stripe exposes idempotency as a first-class feature of its API, and its rules follow the principles covered in this post almost exactly.

The core rules of Stripe's idempotency are these.

  • The client makes the request with a unique key in the Idempotency-Key header. It is usually a random value like a UUID v4.
  • If a request comes again with the same key, Stripe replays and returns the result of the first request as-is. The payment does not happen again.
  • The idempotency key is retained for 24 hours. After that, the same key is treated as a new request.
  • If a concurrent request comes with the same key while the first is still processing, it returns an error (a 409-family response) to prevent a race. The client can just retry a moment later.
  • If you send the same key with a different request body, Stripe treats it as misuse and returns an error. One key must correspond to only one request.
  1st request: Idempotency-Key: K, body: {amount: 5000}
               -> run payment, record result under K, return result

  retry:       Idempotency-Key: K, body: {amount: 5000}   (same body)
               -> replay the recorded result, payment does not happen

  misuse:      Idempotency-Key: K, body: {amount: 9999}   (different body)
               -> error (different requests under one key are not allowed)

To summarize the practical guidance to learn from this design: the client must send the same key on retry, use only one key per logical operation, and create a new key only when intending a new payment. The server stores the key together with the result, prevents concurrency with a unique constraint, and sets an appropriate dedup window.

Practical Checklist

Here is a compressed list of things to verify when introducing idempotency into a payment system.

  • Every state-changing request must be idempotent. Payment creation, capture, refund, and cancel must all be retryable. Reads (GET) are inherently idempotent, but require an idempotency key on every POST that moves money.
  • The client generates the key and reuses it on retry. If the server makes the key, the meaning of a retry disappears.
  • Prevent concurrency with a unique constraint. Look-up-then-insert alone leaves a race condition. The database's unique constraint is the last line of defense.
  • Store the result together with the key. If you store only a flag, you cannot return the original response to a retrying client.
  • Model the payment as a state machine. Defining valid transitions from each state makes retries and crash recovery clear.
  • Put an idempotent consumer on asynchronous events. Assume an at-least-once queue and have the consumer filter duplicates by message ID.
  • Set an appropriate dedup window. Too short and you miss late retries; too long and storage cost and the risk of colliding with a legitimate reuse grow. 24 hours is a common starting point.

Conclusion

A double charge is not an obvious bug in the code but a problem that flows naturally from the fundamental nature of distributed systems: "the network fails, and when it fails, clients retry." A timeout makes it ambiguous whether the payment happened, and within that ambiguity a retry produces a duplicate.

Idempotency is the key that unties this knot. When the client tags a request with a unique key and the server filters duplicates by that key and replays the result, the payment happens exactly once no matter how many times you retry. Add a unique constraint to lock down concurrency, a state machine to clarify the flow, and an idempotent consumer on the asynchronous paths, and you have a payment system that is "safe to retry."

To summarize in a single sentence: do not try to eliminate retries; make retries safe. In distributed systems retries are unavoidable. So the right answer is to design every payment to be idempotent, so that retries do no harm.

References