Table of Contents
Overview
A distributed system is a system in which that has multiple nodes connected to each other through the network. CAP theorem is a fundamental theorem in distributed systems and has three things
- C stands for Consistency
- A stands for Availability
- P stands for Partition Tolerance
It states that you can only achieve only two of the three things mentioned. Why is that?. Let’s first understand each of the above terms before understanding why only two of them can be achieved,
Consistency
This means all nodes are consistent with respect to data. Essentially all nodes see the same view of data. So every read will return the most recent write. Once the writing of any value is successful then the subsequent read will that value
Availability
Your distributed system should be able to send a non-error response in a reasonable amount of time to every request. That response is not guaranteed to be the latest
Partition Tolerance
Your system continues to work in case of network partition happens. It means that even if there is some network issue between the nodes and there are some message failures, packets drop, then also the system continues to operate. It is not that the system shuts down. In other words, the system is tolerant of network partitions.
Ideally, a system should be tolerant of network partitions.
Now we have understood all the terms let’s see what the CAP theorem states
CAP Theorem
CAP theorem states that in a distributed system, you will only be able to achieve two of any three properties above
- CA
- CP
- AP
In other words, a distributed system can’t be consistent, available, and partition tolerant at the same time.
This is illustrated in the graph below
Consistency and Availability
If your system is not tolerant to network partitions then it is possible to achieve both Consistency and Availability
Consistency and Partition Tolerant
If your system is tolerant to network partitions then you can achieve consistency at the cost of availability
Availability and Partition Tolerant
If your system is tolerant to network partitions then you can achieve availability at the cost of consistency
As we mentioned earlier that the system has to be partition tolerant as the network is bound to be failed and communication between nodes is bound to be lost. It should continue to work despite network failure or message loss. As P should always be true for a system hence we can only achieve either C or A. In other words, we can only achieve Consistency or Availability when your system is Partition Tolerant.
Let’s take an example to understand why either Consistency or Availability has to be sacrificed when a system is Partition Tolerant. We will try to prove the CAP theorem in this tutorial using an example.
But even to understand the example correctly, we first need to understand the term Consistency in CAP correctly. When we refer to consistency here we are referring to Strong Consistency here.
A system is strongly consistent if it is reading the most updated write. It never returns out of date or stale values. To take an example consider a distributed system with two nodes. We will see an example of a consistent case as well as an inconsistent case
Example of a consistent case
Both the nodes n0 and n1 contain the value of data item ‘A’ as 1. Read from either of the nodes will return the value of ‘A’ as 1. Now the value of ‘A’ is updated to 2 on both the nodes. Again a read from either of the nodes will return value f ‘A’ as 2. Everything is consistent.
Example of an inconsistent case
Both the nodes n0 and n1 contain the value of data item ‘A’ as 1. Read from either of the nodes will return the value of ‘A’ as 1. Now the value of ‘A’ is updated to 2 on the first node but not on the second. So a read from the first node will return the value of ‘A’ as 2 while a read from node2 will return the value as ‘1’. This is an example of an inconsistent state
Let’s look at the formula for strong consistency now
The formula for Strong Consistency
To understand strong consistency in Databases we need to understand three terms first
- Number of Nodes or Replicas
- Write Quoram
- Read Quoram
Number of Nodes or Replicas
This is the number of Nodes that exist in your system. When we say node here then that essentially means the number of replicas that have the same data
Write Quoram
It is the minimum number of nodes to which write will happen before it is returned as a success.
Read Quoram
It is the minimum number of nodes from which the read will happen before it is returned as a success. If the value returned by those minimum number of nodes is not the same then that read will be rejected
Mathematically if the sum of Write Quoram and Read Quoram is greater than the number of read nodes then the system is said to be strongly consistent.
That is if Write Quoram is W and Read Quoram is R and the number of nodes is N, then
- The system is strongly consistent if W+R > N
- The system is not guaranteed to be strongly consistent if W+R<=N
In other words, if there is a node common between Write Quoram nodes and Read Quoram nodes then the system will be strongly consistent. That common node will reject the stale reads. A node is always common between Write Quoram and Read Quoram whenever W+R > N. It is not mathematically difficult to know why.
Let’s see by some examples how this formula holds. We are going to see the following cases.
- The number of nodes is 1
- The number of nodes is 2
- The number of nodes is 3
The number of nodes is 1
In this case, the only combination of Write and Read Quoram possible is 1 and 1. The Sum of Write Quoram and Read Quoram is greater than the read node. Since there is a single instance, read is always the latest because read and write are happening on the same instance
The number of nodes is 2
In this, there are further four cases.
- Write Quoram is 2 and Read Quoram is 1 – Strongly Consistent
- Write Quoram is 1 and Read Quoram is 2 – Strongly Consistent
- Write Quoram is 2 and Read Quoram is 2 – Strongly Consistent
- Write Quoram is 1 and Read Quoram is 1 – Not Strongly Consistent
Write Quoram is 2 and Read Quoram is 1
In this case, the sum of Write Quoram and Read Quoram is 3 which is greater than the number of nodes hence system should be strongly consistent. In this case, since write is happening on both the nodes, hence read from either of the nodes will return the same data and the system overall will be strongly consistent
Write Quoram is 1 and Read Quoram is 2
Again, in this case, the sum of Write Quoram and Read Quoram is 3 which is greater than the number of nodes hence system should be strongly consistent. In this case, write is happening only on a single instance. But read is happening from both the nodes. Imagine a case where the initial value of a data named A is 1. Both the nodes have a value of A as 1. Now write happened on the first node and the value of A is changed to 2. Since Write Quoram is 1 hence write will happen on the first node and it will return successfully. Imaging before node 2 could sync up with the latest data on node 1, a read happened. Since Read Quoram is 2 it is going to read from both the nodes. The first node will return the value of A as 2 while the second node will return the value of A as 1. Since the value returned by both the nodes is not the same, the system will reject that read to maintain strong consistency.
Write Quoram is 2 and Read Quoram is 2
Again, in this case, the sum of Write Quoram and Read Quoram is 4 which is greater than the number of nodes hence system should be strongly consistent. In this case, since write is happening on both the nodes and read also from both the nodes. Therefore the read will be the latest every time.
Write Quoram is 1 and Read Quoram is 1
Again, in this case, the sum of Write Quoram and Read Quoram is 2 which is equal to the Number of nodes hence system is not strongly consistent. Write happened on node 1. A read happened on node 2 before data could be synced to node 2. This read is stale data.
Proof of CAP Theorem
Now you are aware of the strong consistency, let’s now try to understand the CAP theorem in a better way. Assume you have two nodes in the system. Both are connected to each other and both are in sync.
As we already know with the above formulas that with two nodes, the system will be strongly consistent when
- Write Quoram is 2 and Read Quoram is 1
- Write Quoram is 1 and Read Quoram is 2
- Write Quoram is 2 and Read Quoram is 2
And the system will not be strongly consistent when
- Write Quoram is 1 and Read Quoram is 1
We already mentioned that in a distributed system that is partitioned tolerant, it is only possible to achieve either consistency or availability. Let’s see how. Imagine a network partition happened between the two nodes and the second node is not available
How consistency is achieved at the cost of Availability
Here the number of nodes is 2. Hence two achieve strong consistency we have three options
- Write Quoram is 2 and Read Quoram is 1
- Write Quoram is 1 and Read Quoram is 2
- Write Quoram is 2 and Read Quoram is 2
In the first case since Write Quoram is 2 hence it must write to both the nodes. But since the second node is not available. Hence it will reject the write. Hence the system is not available for write
In the second case since Read Quoram is 2 hence it must read from both the nodes. But since the second node is not available. Hence it will reject the read. Hence the system is not available for reading
In the third case since Write Quoram is 2 hence it must write to both the nodes and since Read Quoram is 2 hence it must read from both the nodes. But since the second node is not available. Hence it will reject both read and write. Hence the system is not available for reading as well as writing
From the three cases, you can deduce that we get strong consistency at the cost of availability in the case of network partitions. Thus we only achieve C and P here. We are not able to get A in any case
How availability is achieved at the cost of Consistency
To achieve availability we must give up consistency. Here the number of nodes is 2. Hence there is one option in which the system will not be strongly consistent.
- Write Quoram is 1 and Read Quoram is 1
In the first case since Write Quoram is 1, hence it can write to node 1. Also, the Read Quoram is 1, and hence it can read from node 1. Hence the system is available. The sum of Write Quoram and Read Quoram is 2 which is equal to the number of nodes. We know by the formula that when the sum of Write Quoram and Read Quoram is less than equal to the number of nodes, then the system is not strongly consistent. Hence in this case we have availability but the system is not consistent.
Thus we only achieve A and P here. We are not able to get C here
This is the proof of the CAP theorem
Conclusion
This is all about the CAP theorem in a distributed system. Hope you have liked this article. Please share feedback in the comments.
Note: Check out our system design tutorial series System Design Questions