![Page 1: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/1.jpg)
Project 2 Review (Part 2)
Ananth Rao
![Page 2: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/2.jpg)
Overview
• Stabilize and Notify
• Join (slides stolen from lecture)
• Coding Trivia
• Bootstrapping and debugging
![Page 3: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/3.jpg)
Identifier to Node Mapping Example• Node 8 maps [5,8]
• Node 15 maps [9,15]
• Node 20 maps [16, 20]
• …• Node 4 maps [59,
4]
4
20
3235
8
15
44
58
![Page 4: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/4.jpg)
Routing• Each node maintains
its successor • Route packet (ID,
data) to the node responsible for ID using successor pointers
4
20
3235
8
15
44
58 send(34,data)
![Page 5: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/5.jpg)
Stabilize
• Sent to the current successorNode periodically
• “Request” for a notify packet from the successor
![Page 6: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/6.jpg)
Notify
• Sent in reply to the stabilize packet.
• Helps build a list of k-successors at the predecessor.
![Page 7: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/7.jpg)
Stabilize-Notify
• Direct communication only with immediate successor and predecessor
• You receive only “n th” hand info about the n th successor
• It takes n*STABILIZE_PERIOD for a change in the n th successor to get propagated
![Page 8: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/8.jpg)
Dealing with failures
• What happens when successorNode fails..– Timeout while waiting to receive a notify– Shift successorNode list by one
• What happens when predecssorNode fails– Timeout on receiving a stabilize from the
prececessor
![Page 9: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/9.jpg)
Dealing with failures (cont.)
• We use fine-grained timers for detecting successor failures
• We use a coarse-grained timer for detecting a predecessor failure– Predecessor is not useful for forwarding
anyway– A fine-grained timer is not useful unless we
maintain a list of precessors
![Page 10: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/10.jpg)
Joining Operation4
20
3235
8
15
44
58
50
• Node 50 asks node 15 to forward join message
• When join(50) reaches the destination (i.e., node 58), node 58 returns a notify message to node 50
• Node 50 updates its successor to 58
join(50)
notify(58)
succ=58
![Page 11: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/11.jpg)
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• Node 50 sends a stabilize to Node 58. The predecessor gets updated at Node 58
• Node 44 sends a stabilize message to its successor, node 58
• Node 58 reply with a notify message
• Node 44 updates its successor to 50
succ=58stabilize()no
tify(predecessor=50)
succ=50
pred=50
![Page 12: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/12.jpg)
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• Node 44 sends a stabilize message to its new successor, node 50
• Node 50 sets its predecessor to node 44
succ=58
succ=50
Stabilize()pred=44
pred=50
![Page 13: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/13.jpg)
Joining Operation (cont’d)4
20
3235
8
15
44
58
50
• This completes the joining operation!
succ=58
succ=50
pred=44
pred=50
![Page 14: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/14.jpg)
Stabilize-Notify-Join
• Very simple
• Easy to code
• Can handle concurrent joins and failures– Try a few examples.. It may a take a few more
STABILIZE_PERIODS to converge, but will eventually converge
![Page 15: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/15.jpg)
Stabilize-Notify-Join (cont.)
• Not easy to understand– When you get it.. you get it.
• Very hard to debug
• Hard to bootstrap– Lots of corner cases when there are less than k-
nodes in the ring
![Page 16: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/16.jpg)
Coding Advice
• Checkpoint submissions better than expected :-)• No major flaws• Be careful with timers
– “select” returns “no sooner than the requested timeout period”
– Each function call takes time!!– Careful in dealing with negative struct timeval
• More feedback coming soon..– Watch the newsgroup over the weekend :-(.
![Page 17: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/17.jpg)
Problems with timers
• After handing the event at the head of the queue..– Get current time again– Check the “due time” of the next event in the
queue
![Page 18: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/18.jpg)
Timers for stabilize
• Time out for receiving a notify
• When to send the next stabilize– Keep track of lastStabilizeSentTime– Use MIN(lastStabilizeSentTime+STABILIZE_PERIOD-
currTime, nextEventDueTime) for timeout to select– Careful when the successorNode changes
![Page 19: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/19.jpg)
Debugging Tips
• Most problems occur when bootstrapping the ring
• Prefer cerr/fprintf debugging to using gdb– If you set a breakpoint in gdb, every other
program on the ring is going to timeout for some reason or the other
• In the beginning, you may want to increase timers to large values
![Page 20: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/20.jpg)
Testing with lost packets
• With large timeouts– Use keyboard input to determine whether or not
to send a packet– Make sure STABILIZE_PERIOD >
(MAX_STABILIZE_RETRIES+1) * STABILIZE_TIMEOUT
• Use randomized drops with a small drop percentage
![Page 21: Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649cb95503460f94981124/html5/thumbnails/21.jpg)
Go step-by-step
• Before implementing join, try and implement stabilize and notify– Start with a predetermined ring– Start with only one successor in command line, but the
list should soon grow (because of stabilize-notify)– Detect failures only (no new nodes)– Use large (1s) timeout so don’t have to start all
“chatpeers” at exactly the same time
• Helps get rid of bootstrapping artifacts in the first step