CAP Theorem

Session 3, Part 2 - 30 minutes

Learning Objectives

Understand the CAP theorem and its three components
Explore the trade-offs between Consistency, Availability, and Partition tolerance
Identify real-world systems and their CAP choices
Learn how to apply CAP thinking to system design

What is the CAP Theorem?

The CAP theorem states that a distributed data store can only provide two of the following three guarantees:

graph TB
    subgraph "CAP Triangle - Pick Two"
        C["Consistency<br/>Every read receives<br/>the most recent write"]
        A["Availability<br/>Every request receives<br/>a response"]
        P["Partition Tolerance<br/>System operates<br/>despite network failures"]
    end

    C <--> A
    A <--> P
    P <--> C

    style C fill:#ffcdd2
    style A fill:#c8e6c9
    style P fill:#bbdefb

The Three Components

1. Consistency (C)

Every read receives the most recent write or an error.

All nodes see the same data at the same time. If you write a value and immediately read it, you get the value you just wrote.

sequenceDiagram
    participant C as Client
    participant N1 as Node 1
    participant N2 as Node 2
    participant N3 as Node 3

    C->>N1: Write X = 10
    N1->>N2: Replicate X
    N1->>N3: Replicate X
    N2-->>N1: Ack
    N3-->>N1: Ack
    N1-->>C: Write confirmed

    Note over C,N3: Before reading...

    C->>N2: Read X
    N2-->>C: X = 10 (latest)

    Note over C,N3: All nodes agree!

Example: A bank system where your balance must be accurate across all branches.

2. Availability (A)

Every request receives a (non-error) response, without the guarantee that it contains the most recent write.

The system remains operational even when some nodes fail. You can always read and write, even if the data might be stale.

sequenceDiagram
    participant C as Client
    participant N1 as Node 1 (alive)
    participant N2 as Node 2 (dead)

    C->>N1: Write X = 10
    N1-->>C: Write confirmed

    Note over C,N2: N2 is down but N1 responds...

    C->>N1: Read X
    N1-->>C: X = 10

    Note over C,N2: System stays available!

Example: A social media feed where showing slightly old content is acceptable.

3. Partition Tolerance (P)

The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

Network partitions are inevitable in distributed systems. The system must handle them gracefully.

graph TB
    subgraph "Network Partition"
        N1["Node 1<br/>Can't reach N2, N3"]
        N2["Node 2<br/>Can't reach N1"]
        N3["Node 3<br/>Can't reach N1"]
    end

    N1 -.->|"🔴 Network Partition"| N2
    N1 -.->|"🔴 Network Partition"| N3
    N2 <--> N3
    N2 <--> N3

    style N1 fill:#ffcdd2
    style N2 fill:#c8e6c9
    style N3 fill:#c8e6c9

Key Insight: In distributed systems, P is not optional—network partitions WILL happen.

The Trade-offs

Since partitions are inevitable in distributed systems, the real choice is between C and A during a partition:

stateDiagram-v2
    [*] --> Normal
    Normal --> Partitioned: Network Split
    Partitioned --> CP: Choose Consistency
    Partitioned --> AP: Choose Availability
    CP --> Normal: Partition heals
    AP --> Normal: Partition heals

    note right of CP
        Reject writes/reads
        until data syncs
    end note

    note right of AP
        Accept writes/reads
        data may be stale
    end note

CP: Consistency + Partition Tolerance

Sacrifice Availability

During a partition, the system returns errors or blocks until consistency can be guaranteed.

sequenceDiagram
    participant C as Client
    participant N1 as Node 1 (primary)
    participant N2 as Node 2 (isolated)

    Note over N1,N2: 🔴 Network Partition

    C->>N1: Write X = 10
    N1-->>C: ❌ Error: Cannot replicate

    C->>N2: Read X
    N2-->>C: ❌ Error: Data unavailable

    Note over C,N2: System blocks rather<br/>than return stale data

Examples:

MongoDB (with majority write concern)
HBase
Redis (with proper configuration)
Traditional RDBMS with synchronous replication

Use when: Data accuracy is critical (financial systems, inventory)

AP: Availability + Partition Tolerance

Sacrifice Consistency

During a partition, the system accepts reads and writes, possibly returning stale data.

sequenceDiagram
    participant C as Client
    participant N1 as Node 1 (accepts writes)
    participant N2 as Node 2 (has old data)

    Note over N1,N2: 🔴 Network Partition

    C->>N1: Write X = 10
    N1-->>C: ✅ OK (written to N1 only)

    C->>N2: Read X
    N2-->>C: ✅ X = 5 (stale!)

    Note over C,N2: System accepts requests<br/>but data is inconsistent

Examples:

Cassandra
DynamoDB
CouchDB
Riak

Use when: Always responding is more important than immediate consistency (social media, caching, analytics)

CA: Consistency + Availability

Only possible in single-node systems

Without network partitions (single node or perfectly reliable network), you can have both C and A.

graph TB
    Single["Single Node Database"]
    Client["Client"]

    Client --> Single
    Single <--> Client

    Note1[No network = No partitions]
    Note --> Single

    style Single fill:#fff9c4

Examples:

Single-node PostgreSQL
Single-node MongoDB
Traditional RDBMS on one server

Reality: In distributed systems, CA is not achievable because networks are not perfectly reliable.

Real-World CAP Examples

System	CAP Choice	Notes
Google Spanner	CP	External consistency, always consistent
Amazon DynamoDB	AP	Configurable consistency
Cassandra	AP	Always writable, tunable consistency
MongoDB	CP (default)	Configurable to AP
Redis Cluster	AP	Async replication
PostgreSQL	CA	Single-node mode
CockroachDB	CP	Serializability, handles partitions
Couchbase	AP	Cross Data Center Replication

Consistency Models

The CAP theorem's "Consistency" is actually linearizability (strong consistency). There are many consistency models:

graph TB
    subgraph "Consistency Spectrum"
        Strong["Strong Consistency<br/>Linearizability"]
        Weak["Weak Consistency<br/>Eventual Consistency"]

        Strong --> S1["Sequential<br/>Consistency"]
        S1 --> S2["Causal<br/>Consistency"]
        S2 --> S3["Session<br/>Consistency"]
        S3 --> S4["Read Your<br/>Writes"]
        S4 --> Weak
    end

Strong Consistency Models

Model	Description	Example
Linearizable	Most recent read guaranteed	Bank transfers
Sequential	Operations appear in some order	Version control
Causal	Causally related operations ordered	Chat applications

Weak Consistency Models

Model	Description	Example
Read Your Writes	User sees their own writes	Social media profile
Session Consistency	Consistency within a session	Shopping cart
Eventual Consistency	System converges over time	DNS, CDN

Practical Example: Shopping Cart

Let's see how different CAP choices affect a shopping cart system:

CP Approach (Block on Partition)

sequenceDiagram
    participant U as User
    participant S as Service

    Note over U,S: 🔴 Network partition detected

    U->>S: Add item to cart
    S-->>U: ❌ Error: Service unavailable

    Note over U,S: User frustrated,<br/>but cart is always accurate

Trade-off: Lost sales, accurate cart

AP Approach (Accept Writes)

sequenceDiagram
    participant U as User
    participant S as Service

    Note over U,S: 🔴 Network partition detected

    U->>S: Add item to cart
    S-->>U: ✅ OK (written locally)

    Note over U,S: User happy,<br/>but cart might conflict

Trade-off: Happy users, possible merge conflicts later

The "2 of 3" Simplification

The CAP theorem is often misunderstood. The reality is more nuanced:

graph TB
    subgraph "CAP Reality"
        CAP["CAP Theorem"]

        CAP --> Misconception["You must choose<br/>exactly 2"]
        CAP --> Reality["You can have all 3<br/>in normal operation"]
        CAP --> Truth["During partition,<br/>choose C or A"]
    end

Key Insights:

P is mandatory in distributed systems
During normal operation, you can have C + A + P
During a partition, you choose between C and A
Many systems are configurable (e.g., DynamoDB)

Design Guidelines

Choose CP When:

✅ Financial transactions
✅ Inventory management
✅ Authentication/authorization
✅ Any system where stale data is unacceptable

Choose AP When:

✅ Social media feeds
✅ Product recommendations
✅ Analytics and logging
✅ Any system where availability is critical

Techniques to Balance C and A:

Technique	Description	Example
Quorum reads/writes	Require majority acknowledgment	DynamoDB
Tunable consistency	Let client choose per operation	Cassandra
Graceful degradation	Switch modes during partition	Many systems
Conflict resolution	Merge divergent data later	CRDTs

Summary

Key Takeaways

CAP theorem: You can't have all three in a partition
Partition tolerance is mandatory in distributed systems
Real choice: Consistency vs Availability during partition
Many systems offer tunable consistency levels
Your use case determines the right trade-off

Check Your Understanding

Why is partition tolerance not optional in distributed systems?
Give an example where you would choose CP over AP
What happens to an AP system during a network partition?
How can quorum reads/writes help balance C and A?

🧠 Chapter Quiz

Test your mastery of these concepts! These questions will challenge your understanding and reveal any gaps in your knowledge.

What's Next

Now that we understand CAP trade-offs, let's build a simple key-value store: Store Basics

Distributed Systems Course