Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Introduction — The Problem With "Pick Two"
The Three Letters, Defined Precisely
Why "Two of Three" Is Wrong
PACELC — The More Complete Picture
Where Real Systems Actually Stand
The C in CAP Is Not the C in ACID
Consistency Is a Spectrum
Common Misconceptions
A Designer's Practical Guide
Conclusion
References

Introduction — The Problem With "Pick Two"

When you first learn distributed systems, you almost always meet the CAP theorem. And you almost always meet it in this condensed form: "You can only pick two of Consistency, Availability, and Partition tolerance." A triangle diagram follows, and each database gets pinned to one of the three sides.

This explanation is easy to remember, but it is wrong. More precisely, it is smoothed over to the point of being misleading. What the actual CAP theorem says is far narrower and more precise. The goal of this post is to remove that smoothing. Without any hand-waving, we will pin down exactly what each of the three letters strictly means, why "two of three" is a misunderstanding, and what the theorem actually implies for real system design.

The Three Letters, Defined Precisely

First we have to nail down the terms. The three letters of CAP sound like everyday words, but inside the theorem they carry very specific meanings.

C — Consistency. Consistency here means linearizability. In plain terms, every client sees the data as if there were only a single copy. Once a write completes successfully, every read that starts afterward must see that write (or a more recent one). It must not return a stale value. This is a completely different concept from the C in ACID, which we will get to later.

A — Availability. Availability here means that every non-failed node must return a (non-error) response to every request. The key phrase is "every non-failed node." It is not enough for some nodes to respond; if a node is alive, it must answer. And the answer must come within a reasonable time. Making the caller wait forever does not count as available.

P — Partition tolerance. A partition is a situation where the network splits so that messages cannot pass between groups of nodes. Partition tolerance is the property that the system keeps operating even when such a partition occurs.

Lay these three definitions side by side and you can already start to see why the familiar triangle is misleading.

Why "Two of Three" Is Wrong

Here is the crux. Partition tolerance (P) is not optional. Network partitions happen in reality whether you choose them or not. Switches die, cables get cut, the link between data centers stalls for a moment. As long as multiple machines communicate over a network, a partition is not something that "might happen" — it is something that "will eventually happen."

So "drop P" is not really a choice that exists. A system that cannot tolerate partitions will give wrong answers or corrupt data the moment one occurs. In any serious distributed system, P is not up for negotiation.

Then what is the real choice? What the CAP theorem actually says is this:

At the moment a network partition occurs, you must give up either consistency (C) or availability (A).

When a partition happens, groups of nodes cannot talk to each other. If a write request arrives at one group at that moment, there are exactly two choices.

Choose availability (AP): Just accept the write and respond. But the other group does not know about this change, so a read there returns a stale value. You gave up consistency.
Choose consistency (CP): Since you cannot coordinate with the other group, reject or block this request. You cannot respond, so you gave up availability.

So CAP is not a matter of "picking two of three during normal operation." It is a matter of "C or A when a partition happens." During normal operation, with no partition, you get both C and A. The choice is forced only in the exceptional situation of a partition.

  No partition (normal)      Partition occurs
  ┌─────────────────┐        ┌─────────────────┐
  │ You can have     │  -->   │ You must choose  │
  │ both C and A     │        │ only C or only A │
  └─────────────────┘        └─────────────────┘

PACELC — The More Complete Picture

CAP leaves out an important half: what happens when there is no partition. CAP only talks about the partition case and stays silent about the normal case. But a system spends most of its time in the normal, partition-free state. The framework that describes the trade-off during that time is PACELC.

You read PACELC like this:

If there is a partition (P), you choose between availability (A) and consistency (C). Else (E), you choose between latency (L) and consistency (C).

The first half (PAC) is the same as CAP. The new part is the second half (ELC): even without a partition, a trade-off remains. To maintain strong consistency, multiple replicas must coordinate, and that coordination takes time. In other words, raising consistency raises latency. Conversely, to reduce latency you must loosen or skip coordination, which weakens consistency.

Through this lens you can classify systems along two axes.

Class	On partition (PAC)	Normal (ELC)	Meaning
PA/EL	Prefer availability	Prefer latency	Always chooses speed and availability
PC/EC	Prefer consistency	Prefer consistency	Always chooses consistency
PA/EC	Prefer availability	Prefer consistency	Only relaxes consistency under partition
PC/EL	Prefer consistency	Prefer latency	Only defends consistency under partition

PACELC is better than CAP because it captures the trade-off we actually face every day. Partitions are rare events, but "consistency versus latency" is a choice that runs continuously through normal operation.

Where Real Systems Actually Stand

Mapping the theory onto real systems makes it much sharper.

The Amazon Dynamo family — AP (and EL). Dynamo and its descendants (for example Cassandra, Riak, and some configurations of DynamoDB) are designed with availability as the top priority. Even under a partition they keep accepting writes, and later, when replicas meet again, they resolve conflicts. This family provides eventual consistency. Right now different nodes may hold different values, but given time they converge. Even in the normal case they prefer low latency over waiting for coordination, so they lean EL. The requirement "the shopping cart must never stall" produced this choice. If items added from two devices diverge for a moment, that is fine — they can be merged later.

Google Spanner — CP (and EC). Spanner is the opposite extreme. It is a globally distributed database that nonetheless provides strong consistency (external consistency, effectively linearizability). When a partition occurs, the side that cannot reach consensus (Paxos) stops accepting writes. That is, it gives up availability for consistency. Even in the normal case it goes through inter-replica consensus and precise time synchronization, so latency rises — it is EC. Spanner is famous because a mechanism called TrueTime, built from atomic clocks and GPS, made this strong consistency practical at global scale. But it did not sidestep CAP. Under a partition it still gives up availability.

A traditional single-node RDBMS. What about a single instance of PostgreSQL or MySQL? There are no replicas split by a network here, so the very concept of a partition is blurry. CAP is fundamentally a theorem about data replicated across multiple nodes. A single node is off that stage. Of course, the moment you attach replication you step back onto the stage of this trade-off.

To summarize: no system can claim to have "beaten CAP" or to "have all three." When a partition happens, everyone must choose between C and A. The difference is only in which side the system was designed to choose.

The C in CAP Is Not the C in ACID

Here we must address a very common confusion. The C in CAP and the C in ACID share only their spelling; their meanings are completely different.

The C in CAP = linearizability. Across multiple replicas, every client sees the most recent value. This is a property of distributed replication.
The C in ACID = consistency, but here it means "preserving constraints." A transaction does not violate the database's invariants (foreign keys, unique constraints, user-defined rules) and only moves from a valid state to a valid state. This is a property of the integrity of a single transaction.

They live on different levels. The C in ACID is a promise that the database enforces the rules the application defined; the C in CAP is a promise that multiple replicas show the same value. If you conflate the two, you end up writing wrong statements like "a CP system guarantees ACID." In reality they are independent concepts.

One step deeper: serializability, the top of the isolation levels often mentioned in distributed transactions, is also different from CAP's linearizability. Serializability means several transactions produce a result equivalent to some serial execution; linearizability means single-object operations respect real-time order. The strong guarantee that combines the two is called strict serializability, and that is roughly what Spanner aims for. The terms are similar and easy to confuse, so always ask yourself "an ordering guarantee about what, exactly?"

Consistency Is a Spectrum

CAP treats consistency as a dichotomy — linearizable or not — but real-world consistency is a broad spectrum. Walking a few steps from the strong end to the weak end helps build intuition.

Linearizable: The strongest. Every read sees the most recent write.
Sequential: All clients see operations in the same order, but that order need not match real time.
Causal: Only causally related operations preserve their order. Unrelated operations may appear reordered.
Eventual: Absent new writes, all replicas eventually converge to the same value. Until then you may see stale values.

When an AP system "gives up consistency," it usually does not collapse into total chaos; it chooses the weaker end of this spectrum (for example, eventual or causal consistency). So rather than the dichotomy of "AP or CP," the more practical question is "which level of consistency am I buying, and at what cost?"

Common Misconceptions

Restating the material as a list of misconceptions:

"You pick two of three during normal operation." No. With no partition you get both C and A. The choice is forced only at the moment of a partition.
"P is an optional choice you can drop." No. Network partitions inevitably happen in reality, so in a distributed system P is effectively mandatory.
"CA systems exist." In a system replicated across multiple nodes, a CA that ignores partitions does not really hold. When a partition occurs you will lose either C or A.
"A CP system is always in an unavailable state." No. A CP system gives up availability only at the moment of a partition, and only in the affected part. The rest of the time it responds normally.
"The C in CAP is the C in ACID." No. As the previous section showed, they are completely different concepts.
"Eventual consistency means the data is a jumbled mess." No. It guarantees convergence, and there are stronger forms such as preserving causality.

A Designer's Practical Guide

Here are useful questions for turning this theory into real decisions.

First, ask is it acceptable to briefly show a stale value for this data? A social feed's like count or a product's view count can be off for a moment without disaster. Here AP and low latency are reasonable. Conversely, data you must never sell twice — a bank balance, inventory quantity, a seat reservation — needs strong consistency, so it leans toward CP.

Next, ask have you explicitly designed the behavior under a partition? Many incidents come from the implicit assumption that "a partition won't happen." Partitions will come, so you must decide in advance: "at that moment, will I block writes, or accept them and reconcile later?" If you chose the latter, you must also decide on a conflict-resolution strategy (last-write-wins by timestamp, version vectors, CRDTs, and so on).

Finally, ask do you know your normal-case latency budget? As PACELC's ELC reminds us, strong consistency costs latency even in the normal case. You must decide whether you can afford waiting for consensus with a replica on the other side of the planet, or whether you will allow weaker consistency per region to cut latency.

Conclusion

The CAP theorem is one of the most frequently cited and most frequently misunderstood results in distributed systems. The hand-wavy "pick two of three" summary is easy to remember, but it hides the truth that matters. The real story is far more precise: partition tolerance is effectively mandatory, and the real choice is made only at the moment a partition occurs, between consistency and availability. And as PACELC reminds us, even in partition-free normal operation there is always a quiet trade-off flowing between consistency and latency.

Dynamo chose availability and Spanner chose consistency, not because one is correct, but because they were solving different problems. A good designer does not claim to have "beaten CAP." Instead they answer clearly the question: "for this data, when a partition happens, what will I give up?" Once you strip away the hand-waving, CAP is not a magic triangle but an honest tool that makes you ask exactly that question.

References

Eric Brewer, "CAP Twelve Years Later: How the 'Rules' Have Changed": https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/
Gilbert & Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services": https://groups.csail.mit.edu/tds/papers/Gilbert/Brewer2.pdf
Daniel Abadi, "Consistency Tradeoffs in Modern Distributed Database System Design (PACELC)": https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf
Google Spanner paper: https://research.google/pubs/pub39966/
Amazon Dynamo paper: https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Jepsen: Consistency Models: https://jepsen.io/consistency