Cassandra Summit 2014: Performance Tuning Cassandra in AWS

Download Cassandra Summit 2014: Performance Tuning Cassandra in AWS

Post on 06-Dec-2014




6 download

Embed Size (px)


Presenters: Michael Nelson, Development Manager at FamilySearch A recent research project at pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.


<ul><li> 1. Performance Tuning Cassandra In AWS" Cassandra Summit 2014! Michael Nelson! 1! 2014 by Intellectual Reserve, Inc. All rights reserved.! </li> <li> 2. 2! Outline! The App: FamilySearch Family Tree! The Test: Borland Silk Performer! The Findings:! Row Cache! Token Aware Driver! Networking Issues! Etc.! </li> <li> 3. 3! What Is FamilySearch?! Website! Very Large Single Pedigree (Family Tree)! Largest Collection of Free Genealogical Records! Largest Genealogical Library! The Church of Jesus Christ of Latter-day Saints (Mormons)! </li> <li> 4. 4! Why does FamilySearch exist?! Visit! ! </li> <li> 5. 5! Family Tree Data! Family Tree: ! 900M+ Person Records, Open-Edit! 500M+ Relationships, Open-Edit! 8.4B Change Log Entries, ~1M / day! 7TB in Cassandra (13TB in Oracle)! Dynamic OLTP system! Data-dependent performance issues! </li> <li> 6. 6! Family Tree: Example 9 Gen Pedigree! up to 511 person slots Dynamic content! </li> <li> 7. 7! Family Tree: Example Pedigree App! 31+ persons per sec0on Dynamic content! </li> <li> 8. 8! Family Tree: Example Ancestor Page! 10+ persons in families 100-1000+ changes Dynamic content! </li> <li> 9. 9! Cassandra Reimplementation! Event-Sourced Data Model journal / views! New Data Model no indexes! New Consistency Model satisfies consistency! P1 JE #8 P1 Views A B P2 P2 Views JE #6 A B </li> <li> 10. 10! 77% Reads / 23% Writes! Reads:! LOCAL_ONE! Simple Queries! Writes:! LOCAL_QUORUM! Atomic Batches! Multiple Tables! Multiple Rows! Business Logic! </li> <li> 11. A Little Optimization Goes A Long Way! 11! 28 Node Cluster! 250,000 op/sec! Optimized App! 8 Node Cluster! 200,000 op/sec! Optimized App! Row Cache! Token Aware Driver! </li> <li> 12. 12! Test System! Cassandra (Community Ed. 2.0.5) Family Tree App Servers (Datastax 2.0.0) Silk Performer Load Agents 8 hi1.4xlarge: 16 CPU 61 GB RAM 2 TB SSD 10 Gb net 60 m2.2xlarge: 4 CPU 34 GB RAM moderate net 25 m2.xlarge: 2 CPU 17 GB RAM moderate net </li> <li> 13. 13! 2x Throughput Increase! 200,000 150,000 100,000 50,000 0 Defaults Row Cache Token Aware concurrent_reads op / sec Reads Writes </li> <li> 14. 14! Row Cache = 35% More Throughput! Default Key Cache:! Cached Disk Location! Data From Disk Cache! ~11ms Reads! Row Cache:! Cached Row Contents! ~7ms Reads! </li> <li> 15. 15! Configuring Row Cache! cassandra.yaml:! # Maximum size of the row cache in memory. # Default value is 0, to disable row caching. row_cache_size_in_mb: 32768 ! Enable For Each Table Explicitly:! ALTER TABLE person_view WITH caching = 'ALL'; ! </li> <li> 16. 16! 90% Row Cache Hit Rate! </li> <li> 17. 17! Token Aware = 50% More Throughput! Default Round Robin:! Coordinator Middleman! Adds Network Hops! Load On Multiple Nodes! ~7ms! Token Aware:! Reads From Replicas! No Network Hops! ~2ms! </li> <li> 18. 18! Configuring Token Aware! Default Load Balancing Policy:! new RoundRobinPolicy() Better:! new TokenAwarePolicy(new RoundRobinPolicy()) </li> <li> 19. concurrent_reads = 5% More Throughput! 19! Defaults:! concurrent_reads: 32 concurrent_writes: 32 native_transport_max_threads: 128 Improved:! concurrent_reads: 256 concurrent_writes: 256 native_transport_max_threads: 256 </li> <li> 20. 20! Now Wheres The Bottleneck?! 181,000 reads/sec; 21,000 writes/sec! CPU = 80%! Network = 10%! Disk &lt; 5%! </li> <li> 21. 21! Network Mystery: C* 800Mb! C* Never Exceeded 800Mb On 10Gb Network! ! ! </li> <li> 22. 22! Network Mystery: Cyclic Net Queues! About 5 Second Cycle of Net Queues Backing Up! Client Machines Seemed OK! Tweaking Network Stack Had No Impact:! net.core.wmem_max! net.core.rmem_max! net.ipv4.tcp_wmem! net.ipv4.tcp_rmem! net.core.somaxconn! net.core.netdev_max_backlog! net.ipv4.tcp_tw_recycle! net.ipv4.tcp_max_syn_backlog! net.ipv4.ip_local_port_range! txqueuelen! </li> <li> 23. 23! Network Mystery: Cyclic Net Queues! Send-Qs Backup! ! </li> <li> 24. 24! Network Mystery: Cyclic Net Queues! Recv-Qs Backup! ! </li> <li> 25. 25! Network Mystery: Cyclic Net Queues! Somewhat Normal Then Starts Again! ! </li> <li> 26. 26! 2x Throughput Increase! 200,000 150,000 100,000 50,000 0 Defaults Row Cache Token Aware concurrent_reads op / sec Reads Writes </li> <li> 27. 27! Contact Info! Michael Nelson" Development Manager!! ! Thanks to FamilySearch team!! ! Thanks to the awesome presenters &amp; organizers at #CassandraSummit!! </li> </ul>