zookeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf ·...
TRANSCRIPT
![Page 1: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/1.jpg)
ZooKeeper
CSCE 678
![Page 2: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/2.jpg)
Weak vs Strong Consistency
• In distributed systems, consistency is often the target of weakening
• Examples of weak consistency:• NoSQL servers – Dynamo• Datanodes in HDFS
• But sometimes, strong consistency is needed
2
![Page 3: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/3.jpg)
Use Cases of Strong Consistency
• Configuration management
• Message Queues
• Group membership
• Synchronization:• Mutexes and read/write locks• Barriers: process joins
3
(Will talk about them one by one)
![Page 4: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/4.jpg)
Use Case: Configuration & Group Membership
4
Primary[Configuration]Primary: …Secondaries: …Port IDs: …Created by: …Push
![Page 5: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/5.jpg)
Use Case: Message Queues
5
ProducerA
ProducerB
Consumer
enqueue
enqueue
dequeue
![Page 6: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/6.jpg)
Use Case: Synchronization
• Mutexes
6
• Read/write locks
lock();
last_x = read(x);write(y, last_x);write(x, last_x + 1);
unlock();
rd_lock();
if (queue.size > 0) {wr_lock();x = queue.dequeue();wr_unlock();
}
rd_unlock();
Shared among readers
Exclusive for one writer
![Page 7: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/7.jpg)
How to Define Strong Consistency?
• Linearizability• Also called atomic consistency or immediate consistency• As soon as a write operation finishes, the whole system
should see the latest data.
7
![Page 8: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/8.jpg)
Serializability vs Linearizability
• For distributed systems, these two words are used in very specific contexts
• Serializability: for databases• Equivalent to Serializable Isolation (I) in ACID• No ordering for concurrent transactions
• Linearizability: for strongly-consistent reads/writes• In respect to operations, not transactions• Considering the global ordering of reads/writes
8
![Page 9: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/9.jpg)
Serializability vs Linearizability
• Serializability
9
Client A
Client B
read(x), write(y), read(y)
read(y), write(x), write(y) read(y), read(x), write(y)
System read(x), write(y), read(y) read(y), write(x), write(y) read(y), read(x), write(y)
Transactions
No constraint of ordering between concurrent transactions.
(from A) (from B) (from B)
time
![Page 10: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/10.jpg)
Serializability vs Linearizability• Actually, serializability can be implemented by linearizability
(i.e., a lock service)
10
Server 1
Server 2
read(x), write(y), read(y)
lock(x)
lock(y)
read (shared) lock
write (exclusive) lock
read(y), write(x), write(y)
blocked
ok ok
ok ok
Two-phase lock (2PL) Irrelevant to two-phase commit (2PC)
![Page 11: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/11.jpg)
Serializability vs Linearizability
• LinearizabilityAs soon as a write operation finishes, the whole system should see the latest data.
11
Client A write(x, 1) => oktime
Server 1
Server 2insert
Client B read(x) => 1
![Page 12: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/12.jpg)
More on Linearizability
• Reads concurrent with a write
12
Client Atime
Client B
read(x) => 0
Client C write(x, 1) => ok
read(x) => 1
read(x) => 1
Any read that begins before the writemust see the old value.
Any read that begins AFTER the write may see old or new value; but as soon as one client see the new value, all other clients should too.
![Page 13: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/13.jpg)
More on Linearizability
• Write concurrent with another write
13
Client Atime
Client B
Client C
write(x, 1) => ok
read(x) => 1
Any write begins before another writemust be committed first.
write(x, 2) => ok
read(x) => 2
![Page 14: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/14.jpg)
Linearizability vs Causality• Linearizability implies causal consistency
14
(1) Causality is based on happens-before relationshipin a client
If client sets x = 1 and sets y = 0, then the two values have causal relationship.
Client Atime
Client B
write(y, 0)write(x, 1)
read(y) => 0
Causality
read(x) => 0
Violate causal relationship!
read(x) => 0
![Page 15: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/15.jpg)
15
Linearizability vs Causality• Linearizability implies causal consistency
(2) Linearizability implies total ordering of each object, soautomatically preserves happens-before.
A linearizable system doesn’t have to do anything to preserve causal consistency.
Client Atime
Client B
write(y, 0)write(x, 1)
read(y) => 0 read(x) => 1
read(x) => 0
Total order
![Page 16: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/16.jpg)
How to Implement Linearizability?
• Read/write from single leader• Failover to a replica may lose linearizability
• Consensus algorithms• Using two-phase commits (2PC) or Paxos• Prevent stale replicas
• Most likely unlinearizable: multi-leader replication
16
Q: How does linearizability apply to CAP Theorem?
![Page 17: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/17.jpg)
Linearizability in CAP Theorem
• Linearizability = Strong C• Read/write through single leader: lose A & P• Read through replicas, write through leader: lose P
• With consensus, linearizability can be:• Partition tolerant when ½ of replicas are connected• Fair availability with wait-free operations and fast,
lossless leader recovery
17
![Page 18: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/18.jpg)
Apache ZooKeeper
• A coordination service for all use cases of strong consistency in a distributed system
• Wait-free: any operation will not block on other slow or failed clients
• ZooKeeper has no API for locking, but can be used to implement any locking mechanism
18
![Page 19: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/19.jpg)
System Overview
19
ZooKeeper Service
ServerServerServer Server Server
Follower LeaderFollower Follower Follower
Client Client Client Client Client Client Client
Forward operations
![Page 20: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/20.jpg)
Namespace
• ZooKeeper uses a filesystem-like namespace
20
/
/App1 /App2
/App1/p_1 /App1/p_2 /App1/p_3
…
znodes
Data Data Data Not large data files
(more like metadata)
Data
![Page 21: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/21.jpg)
Namespace
• ZooKeeper has two types of znodes (paths)• Permanent (regular): clients explicitly create and delete
the znodes• Ephemeral: clients create the znodes, and either delete
them explicitly or let the system automatically deletes them when client sessions timeout.
21
![Page 22: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/22.jpg)
API• create(path, data, flags)
• flags: regular/ephemeral, sequential (appending a seqnum)
• getData(path, watch) -> (data, version)
• setData(path, data, version)
• delete(path, version)
• exists(path, watch) -> true/false
• getChildren(path, watch) -> [paths]
• sync(path) path is ignored now
22
![Page 23: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/23.jpg)
Asynchronous Operations• Operations can be synchronous or asynchronous
• Client can queue up multiple asynchronous operations
• Server responds by invoking callbacks
23
ZooKeeper Se
ServerServerServer
Follower LeaderFollower
Forward operati
Client
getData(x) create(y) …
callback for getData(x) callback for create(y) …
![Page 24: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/24.jpg)
Event Notification
• All operations except sync are wait-free
• No locking API but clients can implement locks using watch events
24
1 l = “/my-lock”;2 if exists(l, watch=true) then wait for watch event;3 n = create(l, EPHEMERAL);4 if n is error then goto 2;
1 delete(l);
Lock (very naïve version)
Unlock
Server will push events to clientwhen the path is updated
![Page 25: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/25.jpg)
A(asynchronous)-Linearizability
• Local order: all operations from the same client are processed FIFO (first-in-first-out)
• Global order: Linearizable writes
25
ServerClient A
ServerClient B
write Op 1, write Op 2, …
write Op 1, write Op 2, …
Some consensus to ensurethe write ops are appliedexactly as their global order.
![Page 26: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/26.jpg)
A(asynchronous)-Linearizability• Reads are not linearized (may not see latest state)
• Read directly from server (identical replicas)• Other servers may have pending writes in queues• Solution: sync after writes
26
ServerClient A
ServerClient B
write Op 1, write Op 2, sync
read Op 1, read Op 2, …
After sync, all clients willsee the changes of priorwrites from client A.
![Page 27: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/27.jpg)
Zab (Atomic Broadcast)
• All servers forward messages to a single leader• Important role as a sequencer• Leader can change if partitioned or failed
• The leader broadcasts (proposes) the messages to be delivered to all followers
27
ZooKeeper Service
ServerServerServer Server Server
Follower LeaderFollower Follower Follower
Forward operations
![Page 28: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/28.jpg)
Zab (Atomic Broadcast)
• Two-phase commit (2PC)
28
Leader
Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request
Follower Follower
![Page 29: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/29.jpg)
Zab (Atomic Broadcast)
• Two-phase commit (2PC)
29
Leader
Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request
Follower Follower
Step 2. Propose the changewith zxid to followers
![Page 30: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/30.jpg)
Zab (Atomic Broadcast)
• Two-phase commit (2PC)
30
Leader
Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request
Follower Follower
Step 2. Propose the changewith zxid to followers
Step 3. Followersacknowledge theproposal.
Q: In what situation maythe follower reject the proposal?
![Page 31: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/31.jpg)
Zab (Atomic Broadcast)
• Two-phase commit (2PC)
31
Leader
Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request
Follower Follower
Step 2. Propose the changewith zxid to followers
Step 3. Followersacknowledge theproposal.
Step 4. Leader commitsthe change if receives morethan ½ of acks (votes).
![Page 32: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/32.jpg)
Leader Election
• If a server finds the leader disconnected or failed, it tries to become the leader (same 2PC protocol).
32
LeaderCandidate
Follower Follower
Becomes the leader whenreceives ½ of votes.
If followers receive multiple proposal,they vote for the candidate with the highest zxid.
![Page 33: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency](https://reader034.vdocuments.mx/reader034/viewer/2022042709/5f4913f48750714c3d5bf871/html5/thumbnails/33.jpg)
References
• “ZooKeeper: Wait-free coordination for Internet-scale systems,” USENIX ATC ‘10 (by Hunt et al.)
• Zab protocol: “A simple totally ordered broadcast protocol”, LADIS ’08 (by Reed and Junqueira)
• “Linearizability: A Correctness Condition for Concurrent Objects”, TOPLAS 1990 (by Herlihy and Wing)
• “Designing Data-Intensive Applications”, O’Reilly 2017(by Martin Kleppmann)
33