What is a Distributed System?
Session 1, Part 1 - 20 minutes
Learning Objectives
- Define what a distributed system is
- Identify key characteristics of distributed systems
- Understand why distributed systems matter
- Recognize distributed systems in everyday life
Definition
A distributed system is a collection of independent computers that appears to its users as a single coherent system.
graph TB
subgraph "Users See"
Single["Single System"]
end
subgraph "Reality"
N1["Node 1"]
N2["Node 2"]
N3["Node 3"]
N4["Node N"]
N1 <--> N2
N2 <--> N3
N3 <--> N4
N4 <--> N1
end
Single -->|"appears as"| N1
Single -->|"appears as"| N2
Single -->|"appears as"| N3
Key Insight
The defining characteristic is the illusion of unity—users interact with what seems like one system, while behind the scenes, multiple machines work together.
Three Key Characteristics
According to Leslie Lamport, a distributed system is:
"One in which the failure of a computer you didn't even know existed can render your own computer unusable."
This definition highlights three fundamental characteristics:
1. Concurrency (Multiple Things Happen At Once)
Multiple components execute simultaneously, leading to complex interactions.
sequenceDiagram
participant U as User
participant A as Server A
participant B as Server B
participant C as Server C
U->>A: Request
A->>B: Query
A->>C: Update
B-->>A: Response
C-->>A: Ack
A-->>U: Result
2. No Global Clock
Each node has its own clock. There's no single "now" across the system.
graph LR
A[Clock A: 10:00:01.123]
B[Clock B: 10:00:02.456]
C[Clock C: 09:59:59.789]
A -.->|network latency| B
B -.->|network latency| C
C -.->|network latency| A
Implication: You can't rely on timestamps to order events across nodes. You need logical clocks (more on this in later sessions!).
3. Independent Failure
Components can fail independently. When one part fails, the rest may continue—or may become unusable.
stateDiagram-v2
[*] --> AllHealthy: System Start
AllHealthy --> PartialFailure: One Node Fails
AllHealthy --> CompleteFailure: Critical Nodes Fail
PartialFailure --> AllHealthy: Recovery
PartialFailure --> CompleteFailure: Cascading Failure
CompleteFailure --> [*]
Why Distributed Systems?
Scalability
Vertical Scaling (Scale Up):
- Add more resources to a single machine
- Eventually hits hardware/cost limits
Horizontal Scaling (Scale Out):
- Add more machines to the system
- Virtually unlimited scaling potential
graph TB
subgraph "Vertical Scaling"
Big[Big Expensive Server<br/>$100,000]
end
subgraph "Horizontal Scaling"
S1[Commodity Server<br/>$1,000]
S2[Commodity Server<br/>$1,000]
S3[Commodity Server<br/>$1,000]
S4[...]
end
Big <--> S1
Big <--> S2
Big <--> S3
Reliability & Availability
A single point of failure is unacceptable for critical services:
graph TB
subgraph "Single System"
S[Single Server]
S -.-> X[❌ Failure = No Service]
end
subgraph "Distributed System"
N1[Node 1]
N2[Node 2]
N3[Node 3]
N1 <--> N2
N2 <--> N3
N3 <--> N1
N1 -.-> X2[❌ One Fails]
X2 --> OK[✓ Others Continue]
end
Latency (Geographic Distribution)
Placing data closer to users improves experience:
graph TB
User[User in NYC]
subgraph "Global Distribution"
NYC[NYC Datacenter<br/>10ms latency]
LON[London Datacenter<br/>70ms latency]
TKY[Tokyo Datacenter<br/>150ms latency]
end
User --> NYC
User -.-> LON
User -.-> TKY
NYC <--> LON
LON <--> TKY
TKY <--> NYC
Examples of Distributed Systems
Everyday Examples
| System | Description | Benefit |
|---|---|---|
| Web Search | Query servers, index servers, cache servers | Fast responses, always available |
| Streaming Video | Content delivery networks (CDNs) | Low latency, high quality |
| Online Shopping | Product catalog, cart, payment, inventory | Handles traffic spikes |
| Social Media | Posts, comments, likes, notifications | Real-time updates |
Technical Examples
Database Replication:
graph LR
W[Write to Primary] --> P[(Primary DB)]
P --> R1[(Replica 1)]
P --> R2[(Replica 2)]
P --> R3[(Replica 3)]
R1 --> Read1[Read from Replica]
R2 --> Read2[Read from Replica]
R3 --> Read3[Read from Replica]
Load Balancing:
graph TB
Users[Users]
LB[Load Balancer]
Users --> LB
LB --> S1[Server 1]
LB --> S2[Server 2]
LB --> S3[Server 3]
LB --> S4[Server N]
Trade-offs
Distributed systems introduce complexity:
| Challenge | Description |
|---|---|
| Network Issues | Unreliable, variable latency, partitions |
| Concurrency | Race conditions, deadlocks, coordination |
| Partial Failures | Some components work, others don't |
| Consistency | Keeping data in sync across nodes |
The Fundamental Dilemma:
"Is the benefits of distribution worth the added complexity?"
For most modern applications, the answer is yes—which is why we're learning this!
Summary
Key Takeaways
- Distributed systems = multiple computers acting as one
- Three characteristics: concurrency, no global clock, independent failure
- Benefits: scalability, reliability, lower latency
- Costs: complexity, network issues, consistency challenges
Check Your Understanding
- Can you explain why there's no global clock in a distributed system?
- Give an example of a distributed system you use daily
- Why does independent failure make distributed systems harder to build?
🧠 Chapter Quiz
Test your mastery of these concepts! These questions will challenge your understanding and reveal any gaps in your knowledge.
What's Next
Now that we understand what distributed systems are, let's explore how they communicate: Message Passing