plato: predictive latency-aware total ordering
DESCRIPTION
PLATO: Predictive Latency-Aware Total Ordering. Mahesh Balakrishnan Ken Birman Amar Phanishayee. Total Ordering. a.k.a Atomic Broadcast delivering messages to a set of nodes in the same order messages arrive at nodes in different orders… nodes agree on a single delivery order - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/1.jpg)
PLATO: Predictive Latency-Aware Total Ordering
Mahesh Balakrishnan
Ken Birman
Amar Phanishayee
![Page 2: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/2.jpg)
Total Ordering
a.k.a Atomic Broadcast delivering messages to a set of nodes
in the same order messages arrive at nodes in different
orders… nodes agree on a single delivery order messages are delivered at nodes in the
agreed order
![Page 3: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/3.jpg)
Modern Datacenters
Applications E-tailers, Finance, Aerospace Service-Oriented Architectures, Publish-
Subscribe, Distributed Objects, Event Notification…
… Totally Ordered Multicast!
Hardware Fast high-capacity networks Failure-prone commodity nodes
![Page 4: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/4.jpg)
Total Ordering in a Datacenter
Inventory ServiceReplica 1
Inventory ServiceReplica 2
Query
Query Update 1
Update 2
Updates are Totally OrderedReplicatedService
Totally Ordered Multicast is used to consistently update Replicated Services
Latency of Multicast System Consistency
Requirement: order multicasts consistently, rapidly, robustly
![Page 5: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/5.jpg)
Multicast Wishlist
Low Latency!
High (stable) throughput Minimal, proactive overheads
Leverage hardware properties HW Multicast/Broadcast is fast, unreliable
Handle varying data rates Datacenter workloads have sharp spikes… and
extended troughs!
![Page 6: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/6.jpg)
State-of-the-Art
Traditional Protocols Conservative Latency-Overhead tradeoff
Example: Fixed Sequencer Simple, works well
Optimistic Total Ordering: deliver optimistically, rollback if incorrect Why this works – No out-of-order arrival in LANs
Optimistic total ordering for datacenters?
![Page 7: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/7.jpg)
PLATO: Predictive Ordering
In a datacenter, broadcast / multicast occurs almost instantaneously Most of the time, messages arrive in
same order at all nodes. Some of the time, messages arrive in
different orders at different nodes. Can we predict out-of-order arrival?
![Page 8: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/8.jpg)
Reasons for Disorder: Swaps
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receives Sender 1's message after
Sender 2's message
Receives Sender 2's message after
Sender 1's message
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Out-of-order arrival can occur when the inter-send interval betweentwo messages is smaller than the diameter of the network
Typical Datacenter Diameter: 50-500 microseconds
![Page 9: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/9.jpg)
E
D
C
B
A
Order of arrivals into user-space
t
G
F
E
D
C
B
A
Order of arrivals into user-space
t
H
A B
E
D
C
F G
G
F
E
D
C
B
A
Order of arrivals into user-space
t
H
A B C D E H
E
D
C
F G
G
F
C
B
A
Order of arrivals into user-space
t
E
D
Reasons for Disorder: Loss
Datacenter networks are over-provisioned Loss never occurs
in the network Datacenter nodes
are cheap Loss occurs due to
end-host buffer overflows caused by CPU contention
![Page 10: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/10.jpg)
Emulab Testbed (Utah)
Cisco 6509
Cisco 6509Cisco 6509
Cisco 6509
Cisco 6513
1 Gb8 Gb
4 Gb
4 Gb
100 Mb
100 Mb
100 Mb
600 Mhz
850 Mhz
850 Mhz 2 Ghz
Emulab3 test scenario: 3 switches of separationOne-way ping latency:
~110 microseconds
Emulab2 test scenario: 2 switches of separationOne-way ping latency:
~100 microseconds
4 Gb
3 GHz
850 Mhz
100 Mb
The Utah Emulab Testbed
![Page 11: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/11.jpg)
Cornell Testbed
HP
Pro
curv
e 40
00M
HP
Procurve
4000M
HP Procurve 6108
100 Mb 100 Mb1 Gb 1 Gb
Cornell3 test scenario:3 switches of separationOne-way ping latency:
~70 microseconds
HP
Pro
curv
e 40
00M
HP
Procurve
4000MHP Procurve 6108
100 Mb 100 Mb1 Gb 1 Gb
1.3 Ghz
1 Gb Cornell5 test scenario: 5 switches of separationOne-way ping latency:
~110 microseconds
1.3 Ghz
HP Procurve 6108
1 Gb1.3 Ghz
1.3 Ghz
The Cornell Testbed
![Page 12: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/12.jpg)
Disorder: Emulab3
At 2800 packets per sec, 2% of all packet pairs are swapped and 0.5% of packets are lost.
Percentage of swaps and losses goes up with data rate
![Page 13: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/13.jpg)
Disorder
![Page 14: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/14.jpg)
Predicting Disorder
Predictor: Inter-arrival time of consecutive packets into user-space
Why? Swaps: simultaneous multicasts
low inter-arrival time Loss: kernel buffer overflow
sequence of low inter-arrival times
![Page 15: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/15.jpg)
Predicting Disorder
95% of swaps and 14% of all pairs are within 128 µsecs
Inter-arrival time of swaps
Inter-arrival time of all pairs
Cornell Datacenter, 400 multicasts/sec
![Page 16: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/16.jpg)
Predicting Disorder
![Page 17: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/17.jpg)
PLATO Design
Heuristic: If two packets arrive within Δ µsecs, possibility of disorder
PLATO Heuristic + Lazy Fixed Sequencer Heuristic works ~ zero (Δ) latency Heuristic fails fixed sequencer latency
![Page 18: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/18.jpg)
PLATO Design
API: optdeliver, confirm, revoke
Ordering Layer:
Pending Queue: Packets suspected to be out-of-order, or queued behind suspected packets
Suspicious Queue:Packets optdelivered to the application, not yet confirmed
![Page 19: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/19.jpg)
PLATO Design
D
optdeliver(A)optdeliver(E)optdeliver(B)optdeliver(D)
B E A
A
E
D
B
C
TC-TD<DELTA
TE-TA>DELTA
Seq MsgOrder: ABCD
D
B
revoke(D)setsuspect(D)setsuspect(C)
E A
C
E
revoke(E)setsuspect(E)
confirm(A, B, C, D)
suspicious
suspicious
suspicious
pending
pending
pending
Underlined packets in pending are suspected
t
![Page 20: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/20.jpg)
Performance
Fixed Sequencer
PLATO
At small values of Δ, very low latency of delivery but more rollbacks
![Page 21: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/21.jpg)
Performance
Latency of both Fixed-Sequencer and PLATO decreases as throughput increases
![Page 22: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/22.jpg)
Performance
Traffic Spike: PLATO is insensitive to data rate, while Fixed Sequencer depends on data rate
![Page 23: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/23.jpg)
Performance
Δ is varied adaptively in reaction to rollbacks
Latency is as good as static Δ parameterization
![Page 24: PLATO: Predictive Latency-Aware Total Ordering](https://reader036.vdocuments.mx/reader036/viewer/2022062723/56813d64550346895da73a86/html5/thumbnails/24.jpg)
Conclusion
First optimistic total order protocol that predicts out-of-order delivery
Slashes ordering latency in datacenter settings
Stable at varying loads Ordering layer of a time-critical
protocol stack for Datacenters