tuning speculative retries to fight latency (michael figuiere, minh do, netflix) | cassandra summit...
TRANSCRIPT
![Page 1: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/1.jpg)
Tuning Speculative Retriesto Fight Latency
Minh Do - Senior Distributed System Engineer Michaël Figuière - Senior Distributed System Engineer
![Page 2: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/2.jpg)
Agenda
What are Speculative Retries?
C* Setting & Internal Code
C* Testing and Outcome
Datastax Java Driver
![Page 3: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/3.jpg)
Speculative Retries
• “The Tail at Scale” - Google paper• Hedged Requests vs. Speculative Retries
![Page 4: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/4.jpg)
Speculative Retries
• Tuning available both in C* and Java Driver
• It is all about the additional read retries to improve the latency tail
![Page 5: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/5.jpg)
C* Setting & Internals
• Read path only • Set at table level:
– None, Always, Custom, Percentile• Using different executors for different behaviors• Impacted by Read Repair (for simplicity, ignore in
this talk)
![Page 6: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/6.jpg)
No Retry Policy
• Use NeverSpeculatingReadExecutor• Coordinator computes a list of alive replicas:
– #blockFor• Sends only one full data request to one replica• Sends additional digest request(s) to others
depending on Consistency Level (CL)
![Page 7: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/7.jpg)
No Retry Policy
Example using CL_LOCAL_QUORUM (replication factor 3)
![Page 8: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/8.jpg)
Always Retry Policy
• Use AlwaysSpeculatingReadExecutor• List of replicas: (#blockFor + 1) or all nodes• Coordinator sends 2 full data requests to 2
replicas• Sends additional digest requests to others
depending on Consistency Level
![Page 9: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/9.jpg)
Always Retry Policy
Example using CL_ONE
![Page 10: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/10.jpg)
Custom/Percentile Retry Policy
• Use SpeculatingReadExecutor• List of replicas: #blockFor + 1• Send 1 full data request and 1 or more digest request to
replicas depending Consistency Level• Use the last replicas on the list for the retries• Coordinator waits for a duration to retry on another replica
– Custom: duration in millisecs– Percentile: the percentile on the sampling latency
distribution
![Page 11: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/11.jpg)
Custom/Percentile Retry Policy
Example using CL_LOCAL_QUORUM (replication factor 3)
![Page 12: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/12.jpg)
C* test - setup
• Cassandra: 12 nodes (8 cores/60Gb RAM), approx 120Gb data per node
• Client: 6 (8 cores/30 Gb RAM)• Use Linux Traffic Control (tc) to simulate network glitch on
one C* node using delayed packet transmission technique
![Page 13: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/13.jpg)
C* test - result
• Clients send 50K reads/sec with 3K writes/sec, • Use TC to force a slow node • Enable 95th Speculative Retry setting• Throughput degraded to 30K reads/sec on Coord.• Avg. latency gets doubled 0.5ms to 1ms• 95th/99th latencies have many spikes to 2x-10x
![Page 14: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/14.jpg)
C* test - result
• Client 18K reads/sec with 1K writes/sec• Use TC to force a slow node • Keep 95th Speculative Retry setting on• Throughput increased to 20+K reads/sec• Avg. latency gets doubled 0.5ms to 1ms• 95th/99th latencies are lower, no spikes or more
stable
![Page 15: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/15.jpg)
C* test - conclusion
• More stable and lower 95th/99th latencies: – Build cluster with extra capacities and turn on
Speculative Retry• C* cluster is already near its max capacity
– Disable Speculative Retry as performance getting worse
![Page 16: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/16.jpg)
Client Speculative Retries
• SpeculativeExecutionPolicy
• Speculative retries for both reads and writes
• But only for idempotent requests
![Page 17: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/17.jpg)
Client Speculative Retries
Example using CL_LOCAL_QUORUM (replication factor 3)
![Page 18: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/18.jpg)
NoSpeculativeExecutionPolicy
• Just the normal Driver behavior
• Retries are attempted only after a fixed timeout as part of Failover
![Page 19: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/19.jpg)
ConstantSpeculativeExecutionPolicy
Trigger a Speculative Retry if the Coordinator hasn’t answer within a given delay.
![Page 20: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/20.jpg)
PercentileSpeculativeExecutionPolicy
Trigger speculative retries if the chosen coordinator doesn’t answer within a given
percentile of its typical response time
![Page 21: Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016](https://reader034.vdocuments.mx/reader034/viewer/2022050803/586f75c71a28ab10258b618f/html5/thumbnails/21.jpg)
Questions
?