c* summit eu 2013: practice makes perfect: extreme cassandra optimization
DESCRIPTION
Speaker: Al Tobey, Open Source Mechanic at DataStax Video: http://www.youtube.com/watch?v=AcPME94F13U&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=24 Ooyala has been using Apache Cassandra since version 0.4. Our data ingest volume has exploded since 0.4 and Cassandra has scaled along with us. Al will cover many topics from an operational perspective on how to manage, tune, and scale Cassandra in a production environment.TRANSCRIPT
![Page 1: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/1.jpg)
PRACTICE MAKES PERFECT: EXTREME CASSANDRA OPTIMIZATION
@AlTobey Open Source Mechanic
Datastax
#CASSANDRAEU
![Page 2: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/2.jpg)
!2
⁍ About me⁍ How not to manage your Cassandra clusters⁍ Make it better⁍ How to be a heuristician⁍ Tools of the trade⁍ More Settings⁍ Show & Tell
#CASSANDRAEU
Outline
![Page 3: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/3.jpg)
!3
⁍ Tech Lead, Compute and Data Services at Ooyala, Inc.⁍ C&D team is #devops: 3 ops, 3 eng, me⁍ C&D team is #bdaas: Big Data as a Service⁍ ~200 Cassandra nodes, expanding quickly
#CASSANDRAEU
Previously: @AlTobey / Ooyala
![Page 4: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/4.jpg)
!4
⁍ Founded in 2007⁍ 230+ employees globally⁍ 200M unique users,110+ countries⁍ Over 1 billion videos played per month⁍ Over 2 billion analytic events per day
#CASSANDRAEU
Ooyala
![Page 5: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/5.jpg)
!5
Ooyala has been using Cassandra since v0.4Use cases: ⁍ Analytics data (real-time and batch) ⁍ Highly available K/V store ⁍ Time series data ⁍ Play head tracking (cross-device resume) ⁍ Machine Learning Data
#CASSANDRAEU
Ooyala & Cassandra
![Page 6: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/6.jpg)
Ooyala: Legacy Platform
cassandracassandracassandracassandra
!6
S3
hadoophadoophadoophadoophadoop
cassandra
ABE Service
APIloggersplayer
START HERE
#CASSANDRAEU
read-modify-write
![Page 7: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/7.jpg)
memTable
Avoiding read-modify-write
!7#CASSANDRAEU
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
cassandra13_drinks column family
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
![Page 8: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/8.jpg)
memTable
Avoiding read-modify-write
!8#CASSANDRAEU
Al Tuesday 2 Wednesday 0
Phillip Tuesday 0 Wednesday 1
cassandra13_drinks column family
ssTable
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
![Page 9: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/9.jpg)
memTable
Avoiding read-modify-write
!9#CASSANDRAEU
Albert Tuesday 22 Wednesday 0
cassandra13_drinks column family
ssTableAlbert Tuesday 2 Wednesday 0
Phillip Tuesday 0 Wednesday 1
ssTable
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 12 Wednesday 0
Tuesday
![Page 10: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/10.jpg)
Avoiding read-modify-write
!10#CASSANDRAEU
cassandra13_drinks column family
ssTable
Albert Tuesday 22 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday 3
Kelvin Tuesday 0 Wednesday 0
Krzysztof Tuesday 0 Wednesday 0
Phillip Tuesday 0 Wednesday 1
![Page 11: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/11.jpg)
2011: 0.6 ➜ 0.8
!11
⁍ Migration is still a largely unsolved problem⁍ Wrote a tool in Scala to scrub data and write via Thrift⁍ Rebuilt indexes - faster than copying
hadoopcassandra
GlusterFS P2Pcassandra
Thrift
#CASSANDRAEU
Scala Map/Reduce
![Page 12: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/12.jpg)
Changes: 0.6 ➜ 0.8
!12
⁍ Cassandra 0.8⁍ 24GiB heap⁍ Sun Java 1.6 update⁍ Linux 2.6.36⁍ XFS on MD RAID5⁍ Disabled swap or at least vm.swappiness=1
#CASSANDRAEU
![Page 13: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/13.jpg)
!13
⁍ 18 nodes ➜ 36 nodes⁍ DSE 3.0⁍ Stale tombstones again!⁍ No downtime!
cassandraGlusterFS P2P
DSE 3.0
Thrift
#CASSANDRAEU
Scala Map/Reduce
2012: Capacity Increase
![Page 14: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/14.jpg)
System Changes: Apache 1.0 ➜ DSE 3.0
!14
⁍ DSE 3.0 installed via apt packages⁍ Unchanged: heap, distro⁍ Ran much faster this time!⁍ Mistake: Moved to MD RAID 0 Fix: RAID10 or RAID5, MD, ZFS⁍ Mistake: Running on Ubuntu Lucid Fix: Ubuntu Precise
#CASSANDRAEU
![Page 15: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/15.jpg)
Config Changes: Apache 1.0 ➜ DSE 3.0
!15
⁍ Schema: compaction_strategy = LCS⁍ Schema: bloom_filter_fp_chance = 0.1⁍ Schema: sstable_size_in_mb = 256⁍ Schema: compression_options = Snappy⁍ YAML: compaction_throughput_mb_per_sec: 0
#CASSANDRAEU
![Page 16: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/16.jpg)
!16
⁍ 36 nodes ➜ lots more nodes⁍ As usual, no downtime!
#CASSANDRAEU
DSE 3.1DSE 3.1
replication
2013: Datacenter Move
![Page 17: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/17.jpg)
!17
Upcoming use cases: ⁍ Store every event from the players at full resolution ⁍ Cache code for the Spark job server ⁍ AMPLab Tachyon backend?
#CASSANDRAEU
Coming Soon for Cassandra at Ooyala
![Page 18: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/18.jpg)
!18
spark
APIloggersplayer kafka
ingest
job server
#CASSANDRAEU
DSE 3.1
Next Generation Architecture: Ooyala Event Store
Tachyon?
![Page 19: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/19.jpg)
!19
⁍ Security⁍ Cost of Goods Sold⁍ Operations / support⁍ Developer happiness⁍ Physical capacity (cpu/memory/network/disk)⁍ Reliability / Resilience⁍ Compromise
#CASSANDRAEU
There’s more to tuning than performance:
![Page 20: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/20.jpg)
!20
⁍ I’d love to be more scientific, but good science takes time⁍ Sometimes you have to make educated guesses⁍ It’s not as difficult as it’s made out to be⁍ Your brain is great at heuristics. Trust it.⁍ Concentrate on bottlenecks⁍ Make incremental changes⁍ Read Malcom Gladwell’s “Blink”
#CASSANDRAEU
I am not a scientist ... heuristician?
![Page 21: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/21.jpg)
Testing Shiny Things
!21
⁍ Like kernels⁍ And Linux distributions⁍ And ZFS⁍ And btrfs⁍ And JVM’s & parameters⁍ Test them in production (if you must)
#CASSANDRAEU
![Page 22: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/22.jpg)
ext4
ext4
ext4
ZFS
ext4
kernel upgrade
ext4
btrfs
Testing Shiny Things: In Production
!22#CASSANDRAEU
![Page 23: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/23.jpg)
!23#CASSANDRAEU
Brendan Gregg’s Tool Chart
http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x
![Page 24: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/24.jpg)
!24#CASSANDRAEU
dstat -lrvn 10
![Page 25: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/25.jpg)
!25#CASSANDRAEU
cl-netstat.pl
https://github.com/tobert/perl-ssh-tools
![Page 26: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/26.jpg)
!26#CASSANDRAEU
iostat -x 1
![Page 27: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/27.jpg)
!27#CASSANDRAEU
htop
![Page 28: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/28.jpg)
!28#CASSANDRAEU
jconsole
![Page 29: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/29.jpg)
!29#CASSANDRAEU
opscenter
![Page 30: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/30.jpg)
!30#CASSANDRAEU
nodetool ring
10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 101204669472175663702469172037896580098 10.10.10.10 Analytics rack1 Up Normal 63.94 MB 0.86% 102671403812352122596707855690619718940 10.10.10.10 Analytics rack1 Up Normal 85.73 MB 0.86% 104138138152528581490946539343342857782 10.10.10.10 Analytics rack1 Up Normal 47.87 MB 0.86% 105604872492705040385185222996065996624 10.10.10.10 Analytics rack1 Up Normal 39.73 MB 0.86% 107071606832881499279423906648789135466 10.10.10.10 Analytics rack1 Up Normal 40.74 MB 1.75% 110042394566257506011458285920000334950 10.10.10.10 Analytics rack1 Up Normal 40.08 MB 2.20% 113781420866907675791616368030579466301 10.10.10.10 Analytics rack1 Up Normal 56.19 MB 3.45% 119650151395618797017962053073524524487
10.10.10.10 Analytics rack1 Up Normal 214.88 MB 11.62% 139424886777089715561324792149872061049 10.10.10.10 Analytics rack1 Up Normal 214.29 MB 2.45% 143588210871399618110700028431440799305 10.10.10.10 Analytics rack1 Up Normal 158.49 MB 1.76% 146577368624928021690175250344904436129 10.10.10.10 Analytics rack1 Up Normal 40.3 MB 0.92% 148140168357822348318107048925037023042
![Page 31: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/31.jpg)
!31#CASSANDRAEU
nodetool cfstatsKeyspace: gostress Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: stressful SSTable count: 1 Space used (live): 32981239 Space used (total): 32981239 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 336 Compacted row minimum size: 7007507 Compacted row maximum size: 8409007 Compacted row mean size: 8409007
Could be using a lot of heap
Controllable by sstable_size_in_mb
![Page 32: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/32.jpg)
!32#CASSANDRAEU
nodetool proxyhistograms
Offset Read Latency Write Latency Range Latency 35 0 20 0 42 0 61 0 50 0 82 0 60 0 440 0 72 0 3416 0 86 0 17910 0 103 0 48675 0 124 1 97423 0 149 0 153109 0 179 2 186205 0 215 5 139022 0 258 134 44058 0 310 2656 60660 0 372 34698 742684 0 446 469515 7359351 0 535 3920391 31030588 0 642 9852708 33070248 0 770 4487796 9719615 0 924 651959 984889 0
![Page 33: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/33.jpg)
!33#CASSANDRAEU
nodetool compactionstats
al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 9819749801 16922291634 58.03% Compaction hastur counter_archive 12141850720 16147440484 75.19% Compaction hastur mark_archive 647389841 1475432590 43.88% Active compaction remaining time : n/a al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 10239806890 16922291634 60.51% Compaction hastur counter_archive 12544404397 16147440484 77.69% Compaction hastur mark_archive 1107897093 1475432590 75.09% Active compaction remaining time : n/a
![Page 34: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/34.jpg)
!34#CASSANDRAEU
⁍ cassandra-stress⁍ YCSB⁍ Production⁍ Terasort (DSE)⁍ Homegrown
Stress Testing Tools
![Page 35: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/35.jpg)
!35#CASSANDRAEU
kernel.pid_max = 999999 fs.file-max = 1048576 vm.max_map_count = 1048576 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 65536 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 vm.swappiness = 1 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5
/etc/sysctl.conf
![Page 36: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/36.jpg)
!36#echo cfq > /sys/block/sda/queue/scheduler
ra=$((2**14))# 16k ss=$(blockdev --getss /dev/sda) blockdev --setra $(($ra / $ss)) /dev/sda !
echo 256 > /sys/block/sda/queue/nr_requests #echo cfq > /sys/block/sda/queue/scheduler #echo deadline > /sys/block/sda/queue/scheduler #echo noop > /sys/block/sda/queue/scheduler !
echo 16384 > /sys/block/md7/md/stripe_cache_size
/etc/rc.local
![Page 37: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/37.jpg)
!37#CASSANDRAEU
-Xmx8G leave it alone -Xms8G leave it alone -Xmn1200M 100MiB * nCPU -Xss180k should be fine !
-XX:+UseNUMA (test it) numactl --interleave (safe option)
JVM Args
![Page 38: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/38.jpg)
cgroups
!38#CASSANDRAEU
Provides fine-grained control over Linux resources⁍ Makes the Linux scheduler behave⁍ Lets you manage systems under extreme load⁍ Useful on all Linux machines⁍ Can choose between determinism and flexibility
![Page 39: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/39.jpg)
cgroups
!39#CASSANDRAEU
cat >> /etc/default/cassandra <<EOF cpucg=/sys/fs/cgroup/cpu/cassandra mkdir $cpucg cat $cpucg/../cpuset.mems >$cpucg/cpuset.mems cat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpus echo 100 > $cpucg/shares echo $$ > $cpucg/tasks EOF
![Page 40: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/40.jpg)
Successful Experiment: btrfs
!40#CASSANDRAEU
mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mount -o compress=lzo /dev/sdc1 /data
![Page 41: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/41.jpg)
Successful Experiment: ZFS on Linux
!41#CASSANDRAEU
zpool create data raidz /dev/sd[c-h] zfs create data/cassandra zfs set compression=lzjb data/cassandra zfs set atime=off data/cassandra zfs set primarycache=metadata data/cassandra
![Page 42: C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization](https://reader031.vdocuments.mx/reader031/viewer/2022020217/54c661974a79591e088b4570/html5/thumbnails/42.jpg)
Conclusions
!42#CASSANDRAEU
⁍ Tuning is multi-dimensional⁍ Production load is your most important benchmark⁍ Lean on Cassandra, experiment!⁍ No one metric tells the whole story