Download - Real world capacity
![Page 1: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/1.jpg)
Real world capacity planning: Cassandra on
blades and big iron
July 2011
![Page 2: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/2.jpg)
About me
• Hadoop System Admin @ media6degreeso Watch cassandra servers as wello Write code (hadoop filecrusher)
• Hive Committer o Variable substitution, UDFs like atan, rough draft of c* handler
• Epic Cassandra Contributor (not!)o CLI should allow users to chose consistency levelo NodeCmd should be able to view Compaction Statistics
• Self proclaimed president of Cassandra fan clubo Cassandra NYC User Groupo High Performance Cassandra Cookbook
![Page 3: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/3.jpg)
Media6 Degrees
• Social Targeting in online advertising• Real Time Bidding - A dynamic auction process
where each impression is bid for in (near) real time• Cassandra @ work storing:
oVisit DataoAd Historyo Id Mapping
• Multiple Data Centers (home brew replication)• Back end tools hadoop (Data mining, bulk loads)• Front end tomcat, mysql + cassandra (lookup data)
![Page 4: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/4.jpg)
What is this talk about?
• Real World Capacity Planning• Been running c* in production > 1 year• Started with a hand full of nodes also running tomcat
and Replication Factor 2!• Grew data from 0-10 TB data• Grew from 0-751,398,530 reads / day• All types of fun along the way
![Page 5: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/5.jpg)
Using puppet, chef... from day 1
• “I am going to chose Cassandra 0.6.0-beta-1 over 0.5.x so I am future proof” -- Famous quote by me • Cassandra is active
onew versions are comingoRolling restarts between minors– But much better to get all to same rev quickly
• New nodes are coming do not let them:o start with the wrong settingso fail because you forgot open file limits, etc
![Page 6: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/6.jpg)
Calculating Data size on disk
• SSTable format currently not compressed• Repairs, joins, and moves need “wiggle room”• Smaller keys and column names save space• Enough free space to compact your largest column
family• Snapshots keep SSTables around after compaction• Most *Nix files systems need free space to avoid
performance loss to fragmentation!
![Page 7: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/7.jpg)
Speed of disk
• The faster the better!• But faster + bigger gets expensive and challenging• RAID0– Faster for streaming – not necessarily seeking– Fragile, larger the stripe, higher chance of failure
• RAID5– Not as fast but survives disk failure
• Battery backed cache helps but is $$$• The dedicated commit log decision
![Page 8: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/8.jpg)
Disk formatting
• ext4 everywhere• Deletes are much better then ext3• Noticeable performance as disks get full• A full async mode for risk takers• Obligatory noatime fstab setting• using multiple file systems can result in multiple
caches (check slabtop)• Mention XFS
![Page 9: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/9.jpg)
Memory
• Garbage collection is on a separate thread(s)• Each request creates temporary objects• Cassandra's fast writes go to Memtables– You will never guess what they use :)
• Bloom filter data is in memory• Key cache and Row cache• For low latency RAM must be some % of data– RAM not used by process is OS cache
![Page 10: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/10.jpg)
CPU
• Workload could be more disk then CPU bound • High load needs a CPU to clean up java garbage• Other then serving requests, compaction uses
resources
![Page 11: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/11.jpg)
Different workloads
Structured log format of C* has deep implications• Is data written once or does it change over time?• How high is data churn?• How random is the read/write pattern?• What is the write/read percentage?• What are your latency requirements?
![Page 12: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/12.jpg)
Large Disk / Big Iron key points
• RAID0 mean time to failure with bigger stripes• Java can not address large heaps well• Compactions/Joins/repairs take a long time– Lowers agility when joining a node could take
hours• Maintaining high RAM to Data percentage costly IE 2
machines with 32GB vs 1 machine with 64GB• Capacity heavily diminished with loss of one node
![Page 13: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/13.jpg)
Blade server key points
• Management software gives cloud computing vibe • Cassandra internode traffic on blade back plane• Usually support 1-2 on board disk SCSI/SSD• Usually support RAM configurations up to 128G• Single and duel socket CPU• No exotic RAID options
![Page 14: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/14.jpg)
Schema lessons
• You only need one column family. not always true• Infrequently read data in the same CF as frequently
data compete for “cache”• Separating allows employing multiple cache options• Rows that are written or updated get fragmented
![Page 15: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/15.jpg)
Capacity Planning rule #1
Know your hard drive limits
![Page 16: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/16.jpg)
Capacity Planning rule #2
Writes are fast, until c* flushes and compacts so much, that they are not
![Page 17: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/17.jpg)
Capacity Planning rule #3
Row cache is fools gold
– Faster then a read from disk cache
– Memory use (row key + columns and values)
– Causes memory pressure (data in and out of mem)
– Fails with large rows
– Cold on startup
![Page 18: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/18.jpg)
Capacity Planning rule #4
Do not upgrade tomorrow what you can upgrade today
– Joining nodes is intensive on the cluster– Do not wait till c* disks are 99% utilized – Do not get 100% benefit of new nodes until
neighbors are cleaned – Doubling nodes results in less move steps– Adding RAM is fast and takes heat of hard
disk
![Page 19: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/19.jpg)
Capacity Planning rule #5
Know your traffic patterns better then yourself
![Page 20: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/20.jpg)
The use case:
Dr. Real Time and Mr. Batch
![Page 21: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/21.jpg)
Dr. Real Time
– Real time bidding needs low latency
– Peak traffic during the day
– Need to keep a high cache hit rate
– Avoid compact, repair, cleanup, joins
![Page 22: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/22.jpg)
Dr. Real Time's Lab
– Experiments with Xmx vs VFS caching
– Experiments with cache sizing
– Studying graphs as new releases and features are added
– Monitoring dropped messages, garbage collection
– Dr. Real Time enjoys lots of memory for GB of data on disk
• Enjoys reading (data), writing as well
• Nice sized memtables help to not pollute vfs cache
![Page 23: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/23.jpg)
Mr. Batch
– Night falls and users sleep
– Batch/Back loading data (bulk inserts)
– Finding and removing old data (range scanning)
– Maintenance work (nodetool)
![Page 24: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/24.jpg)
Mr. Batch rampaging
through the data– Bulk loading
• Write at quorum, c* work harder on front end
• Turning off compaction
– For short burst fine, but we are pushing for hours
– Forget to turn it back on SSTable count gets bad fast
– Range scanning to locate and remove old data
– Scheduling repairs and compaction
– Mr. Batch enjoys tearing through data
• Writes, tombstones, range scanning, repairs
• Enjoys fast disks for compacting
![Page 25: Real world capacity](https://reader036.vdocuments.mx/reader036/viewer/2022062511/54c3856a4a7959fd7a8b4576/html5/thumbnails/25.jpg)
Questions
– ???