(bdt323) amazon ebs & cassandra: 1 million writes per second
TRANSCRIPT
![Page 1: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/1.jpg)
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jim Plush Sr Director of Engineering, CrowdStrike
Dennis Opacki, Sr Cloud Systems Architect, CrowdStrike
October 2015
BDT323
Amazon EBS and Cassandra1 Million Writes Per Second on 60 Nodes
![Page 2: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/2.jpg)
An Introduction
to CrowdStrike
We Are CyberSecurity Technology Company
We Detect, Prevent And Respond To All Attack Types In Real Time,
Protecting Organizations From Catastrophic Breaches
We Provide Next Generation Endpoint Protection, Threat Intelligence & Pre &Post
IR Services
http://www.crowdstrike.com/introduction-to-crowdstrike-falcon-host/
![Page 3: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/3.jpg)
CrowdStrike Scale
• Cloud-based endpoint protection
• Single customer can generate > 2 TB daily
• 500K+ events per second
• Multi-petabytes of managed data
© 2015. All Rights Reserved.
![Page 4: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/4.jpg)
Truisms???
• HTTPS is too slow to run everywhere
• All you need is anti-virus
• Never run Cassandra on Amazon EBS
© 2015. All Rights Reserved.
![Page 5: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/5.jpg)
© 2015. All Rights Reserved.
What is Amazon EBS?
EBS data volume
EBS data volume
/mnt/foo
/mnt/bar
EC2 Instance
Network mounted hard drive
Ability to snapshot data
Data encryption at rest & in flight
![Page 6: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/6.jpg)
Existing Amazon EBS Assumptions
• Jittery I/O a.k.a: Noisy neighbors
• Single point of failure in a region
• Cost is too damn high
• Bad volumes (dd and destroy)© 2015. All Rights Reserved.
![Page 7: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/7.jpg)
A Recent Project: Initial Requirements
• 1PB of incoming event data from millions of devices
• Modeled as a graph
• 1 million writes per second (burst)
• Age data out after x days
• 95% write 5% read
© 2015. All Rights Reserved.
![Page 8: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/8.jpg)
We Tried
• Cassandra + Titan
• Sharding?
• Neo4J
• PostgreSQL, MySQL, SQLite
• LevelDB/RocksDB
© 2015. All Rights Reserved.
![Page 9: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/9.jpg)
We Have to Make This Work
Cassandra had the properties we needed
Time for a new approach?
© 2015. All Rights Reserved. http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html
![Page 10: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/10.jpg)
Number of Machines for 1PB
© 2015. All Rights Reserved.
0.
450.
900.
1350.
1800.
2250.
I2.xlarge c4.2XL EBS
![Page 11: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/11.jpg)
Yearly Cost for 1PB Cluster
© 2015. All Rights Reserved.
0.
4.
8.
12.
16.
I2.xlarge-on demand I2.xlarge-reserved c4.2xl - on demand c4.2xl - reserved
Mill
ions o
f $
With Amazon EBS
![Page 12: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/12.jpg)
Initial Launch
Date Tiered Compaction
© 2015. All Rights Reserved.
…more details by Jeff Jirsa, CrowdStrike
Cassandra Summit 2015 - DTCS
http://www.slideshare.net/JeffJirsa1/cassandra-summit-2015-real-world-dtcs-for-operators
![Page 13: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/13.jpg)
Initial Launch
• Cassandra 2.0.12 (DSE)
• m3.2xlarge 8 core
• Single 4TB EBS GP2 ~10,000 IOPS
• Default tunings
© 2015. All Rights Reserved.
![Page 14: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/14.jpg)
Performance Was Terrible
• 12 node cluster
• ~60K writes per second RF2
• ~10K writes per 8 core box
• We went to the experts
© 2015. All Rights Reserved.
![Page 15: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/15.jpg)
© 2015. All Rights Reserved.
Cassandra Summit 2014
Family Search asked the
same question:
Where’s the bottleneck?
https://www.youtube.com/watch?v=Qfzg7gcSK-g
![Page 16: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/16.jpg)
IOPS Available
© 2015. All Rights Reserved.
0.
12500.
25000.
37500.
50000.
I2.xlarge c4.2xlarge
![Page 17: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/17.jpg)
© 2015. All Rights Reserved.
1.3K IOPS?
![Page 18: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/18.jpg)
© 2015. All Rights Reserved.
IOPS
I see you there,
but I can’t reach you!
![Page 19: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/19.jpg)
![Page 20: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/20.jpg)
© 2015. All Rights Reserved.
The magic gates
opened…
We hit 1 million
writes per second
RF3 on 60 nodes
![Page 21: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/21.jpg)
© 2015. All Rights Reserved.
Testing Setup
![Page 22: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/22.jpg)
Testing Methodology
• Each test run
• clean C* instances
• old test keyspaces dropped
• 13+TBs of data loaded during read testing
• 20 C4.4XL Stress Writers each with their own 1BB sequence
© 2015. All Rights Reserved.
![Page 23: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/23.jpg)
Cluster Topology
© 2015. All Rights Reserved.
Stress Node
10 Instances
AZ: 1A
Stress Nodes
10 Instances
AZ: 1B
20 C* Nodes
AZ: 1A
20 C* Nodes
AZ: 1B
20 C* Nodes
AZ: 1C
OpsCenter
![Page 24: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/24.jpg)
Amazon EBS
© 2015. All Rights Reserved.
![Page 25: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/25.jpg)
Cassandra Stress 2.1.x
© 2015. All Rights Reserved.
bin/cassandra-stress user duration=100000m cl=ONE profile=/home/ubuntu/summit_stress.yaml ops\(insert=1\) no-warmup -pop
seq=1..1000000000 -mode native cql3 -node 10.10.10.XX -rate threads=1000 -errors ignore
![Page 26: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/26.jpg)
© 2015. All Rights Reserved.
PCSTAT - Al Tobey
http://www.datastax.com/dev/blog/compaction-improvements-in-cassandra-21
https://github.com/tobert/pcstat
![Page 27: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/27.jpg)
© 2015. All Rights Reserved.
Netflix Test - What is C* capable of?
![Page 28: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/28.jpg)
Netflix Test
© 2015. All Rights Reserved.
1+ million writes per second RF:3 3+ million local writes per second
NICE!
![Page 29: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/29.jpg)
Netflix Test
© 2015. All Rights Reserved.
![Page 30: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/30.jpg)
Netflix Test
© 2015. All Rights Reserved.
No dropped mutations, system healthy at 1.1M after 50 mins
![Page 31: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/31.jpg)
Netflix Test
© 2015. All Rights Reserved.
I/O util is not peggedCommit disk = steady!
![Page 32: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/32.jpg)
Netflix Test
© 2015. All Rights Reserved.
Low I/O wait
![Page 33: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/33.jpg)
Netflix Test
© 2015. All Rights Reserved.
95th Latency = Reasonable
![Page 34: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/34.jpg)
Netflix Test - Read Fail
© 2015. All Rights Reserved.
compression={'chunk_length_kb': '64', 'sstable_compression': 'LZ4Compressor'}
https://issues.apache.org/jira/browse/CASSANDRA-10249
https://issues.apache.org/jira/browse/CASSANDRA-8894
Data Drive Pegged
![Page 35: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/35.jpg)
Reading Data
• 24-hour read test
• over 10 TBs of data in the CF
• sustained > 350K reads per
second over 24 hours
• 1M reads/per sec peak
• CL ONE
• 12 C4.4XL stress boxes
© 2015. All Rights Reserved.
![Page 36: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/36.jpg)
Reading Data
© 2015. All Rights Reserved.
![Page 37: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/37.jpg)
Reading Data
© 2015. All Rights Reserved.
![Page 38: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/38.jpg)
Reading Data
© 2015. All Rights Reserved.
Not Pegged
![Page 39: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/39.jpg)
Reading Data
© 2015. All Rights Reserved.
7.2ms 95th latency
![Page 40: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/40.jpg)
180 less cores (45 less i2.xlarge instances)
• C4.4XL vs. i2.XLarge
24 hour test (sans data transfer cost)
• Netflix cluster/stress
• Cost: ~$6300
• 285 i2.xlarge $0.85 per hour
• CrowdStrike cluster/stress with Amazon EBS cost
• Cost: ~$2600
• 60 C4.4XL $0.88 per hour
VS Netflix Blog Post
![Page 41: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/41.jpg)
• Our test was a single 10K IOPS volume
• More/bigger reads?
• PIOPS gives you as much throughput as you need
• RAID0 multiple Amazon EBS volumes
Read Notes with Amazon EBS
EBS Data
Volume
EBS Data
Volume
/mnt
/foo
/mnt/bar
EC2 Instance
![Page 42: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/42.jpg)
© 2015. All Rights Reserved.
What Unlocked Performance
![Page 43: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/43.jpg)
Major Tweaks
• Ubuntu HVM types
• Enhanced networking
• Now faster than PV
• Ubuntu distro tuned for cloud workloads
• XFS Filesystem
© 2015. All Rights Reserved.
![Page 44: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/44.jpg)
Major Tweaks
Major Tweaks
• Cassandra 2.1
• Java 8
• G1 Garbage Collector
© 2015. All Rights Reserved.
https://issues.apache.org/jira/browse/CASSANDRA-7486
![Page 45: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/45.jpg)
Major Tweaks
• C4.4XL 16 core, EBS Optimized
• 4TB, 10,000 IOPS EBS GP2 Encrypted Data Drive
• 160MB/s throughput
• 1TB 3000 IOPS EBS GP2 Encrypted Commit Log Drive
© 2015. All Rights Reserved.
![Page 46: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/46.jpg)
Major Tweaks
cassandra-env.sh
• MAX_HEAP_SIZE=8G
• JVM_OPTS=“$JVM_OPTS —XX:+UseG1GC”
• Lots of other minor tweaks in crowdstrike-tools
© 2015. All Rights Reserved.
![Page 47: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/47.jpg)
cassandra-env.sh
© 2015. All Rights Reserved.
Put PID in batch mode
Mask CPU0 from the process to reduce context switching
Magic From Al Tobey
![Page 48: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/48.jpg)
YAML Settings
cassandra.yaml (based on 16 core)
• concurrent_reads: 32
• concurrent_writes: 64
• memtable_flush_writers: 8
• trickle_fsync: true
• trickle_fsync_interval_in_kb: 1000
• native_transport_max_threads: 256
• concurrent_compactors: 4
© 2015. All Rights Reserved.
![Page 49: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/49.jpg)
cassandra.yaml
© 2015. All Rights Reserved.
We found a good portion of the CPU load was
being used for internode compression which
reduced write throughput
internode_compression: none
![Page 50: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/50.jpg)
Lessons Learned
• Amazon EBS was never the bottleneck during testing, GP2 is legit
• Built-in types like list and map come at a performance penalty
• 30% hit on our writes using Map type
• DTCS is very young (see Jeff Jirsa’s talk)
• 2.1 Stress Tool is tricky but great for modeling workloads
• How will compression affect your read path?
© 2015. All Rights Reserved.
![Page 51: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/51.jpg)
© 2015. All Rights Reserved.
Test Your Own!
https://github.com/CrowdStrike/cassandra-tools
![Page 52: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/52.jpg)
It’s Just Python
launch 20 nodes in us-east-1
• python launch.py launch --nodes=20 —config=c4-ebs-hvm
—az=us-east-1a
bootstrap the new nodes with C*, RAID/Format disks, etc…
• fab -u ubuntu bootstrapcass21:config=c4-highperf
run arbitrary commands
• fab -u ubuntu cmd:config=c4-highperf,cmd="sudo rm -rf
/mnt/cassandra/data/summit_stress"
© 2015. All Rights Reserved.
![Page 53: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/53.jpg)
Run Custom Stress Profiles… Multi-Node Support
[email protected]:~$ python runstress.py --profile=stress10 —seednode=10.10.10.XX —-threads=50
Going to run: /home/ubuntu/apache-cassandra-2.1.5/tools/bin/cassandra-stress user duration=100000m cl=ONE
profile=/home/ubuntu/summit_stress.yaml ops\(insert=1,simple=9\) no-warmup -pop seq=1..1000000000 -mode native cql3 -node
10.10.10.XX -rate threads=50 -errors ignore
© 2015. All Rights Reserved.
[email protected]:~$ python runstress.py --profile=stress10 --seednode=10.10.10.XX --threads=50
Going to run: /home/ubuntu/apache-cassandra-2.1.5/tools/bin/cassandra-stress user duration=100000m cl=ONE
profile=/home/ubuntu/summit_stress.yaml ops\(insert=1,simple=9\) no-warmup -pop seq=1000000001..2000000000 -mode native cql3
-node 10.10.10.XX -rate threads=50 -errors ignore
export NODENUM=1
export NODENUM=2
![Page 54: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/54.jpg)
• ~3 months on our Amazon EBS–based cluster
• Hundreds of TBs of graph data and growing in C*
• Billions of vertices/edges
• Changing perceptions?
• DataStax - Planning an Amazon EC2 cluster
Where Are We Today?
![Page 55: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/55.jpg)
Al Tobey’s Tuning Guide for Cassandra 2.1
https://tobert.github.io/pages/als-cassandra-21-tuning-
guide.html
Resources
![Page 56: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/56.jpg)
Special Thanks To
Leif Jackson
Marcus King
Alan Hannan
Jeff Jirsa
© 2015. All Rights Reserved.
• Al Tobey
• Nick Panahi
• J.B. Langston
• Marcus Eriksson
• Iian Finlayson
• Dani Traphagen
![Page 57: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/57.jpg)
Amazon EBS Heading Into 2016
© 2015. All Rights Reserved.
![Page 58: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/58.jpg)
4TB (10k IOPS) GP2
I/O Hit? Not enough to phase C*
![Page 59: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/59.jpg)
© 2015. All Rights Reserved.
So why the hate for
Amazon EBS?
![Page 60: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/60.jpg)
© 2015. All Rights Reserved.
• Used instance-store image and
ephemeral drives
• Painful to stop/start instances, resize
• Couldn’t avoid scheduled maintenance
(i.e., Reboot-a-palooza)
• Encryption required shenanigans
Following the Crowd – Trust Issues
![Page 61: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/61.jpg)
© 2015. All Rights Reserved.
• We still had failures
• Now we get to rebuild from scratch
Guess What
![Page 62: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/62.jpg)
© 2015. All Rights Reserved.
What do you mean my volume is “stuck”?
• April 2011 – Netflix, Reddit, and Quora
• October 2012 – Reddit, Imgur, Heroku
• August 2013 – Vine, Airbnb
Amazon EBS’s Troubled Childhood
![Page 63: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/63.jpg)
© 2015. All Rights Reserved.
http://techblog.netflix.com/2011/04/lessons-
netflix-learned-from-aws-outage.html
Spread services across multiple regions
Test failure scenarios regularly (Chaos Monkey)
Make Cassandra databases more resilient by
avoiding Amazon EBS
Kiss of Death
![Page 64: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/64.jpg)
© 2015. All Rights Reserved.
Amazon moves quickly and quietly:
• March 2011 – New Amazon EBS GM
• July 2012 – Provisioned IOPs
• May 2014 – Native encryption
• Jun 2014 – GP2 (game changer)
• Mar 2015 – 16TB / 10K GP2/ 20K PIOPS
Redemption
![Page 65: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/65.jpg)
© 2015. All Rights Reserved.
• Prioritized Amazon EBS availability and consistency beyond features and functionality
• Compartmentalized the control plane – removed cross-AZ dependencies for running volumes
• Simplified workflows to favor sustained operation
• Tested and simulated via TLA+/PlusCal - better understood corner cases
• Dedicated a large fraction of engineering resources to reliability and performance
Redemption
![Page 66: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/66.jpg)
© 2015. All Rights Reserved.
Amazon EBS team targets 99.999%
availability
exceeding expectations
Reliability
![Page 67: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/67.jpg)
© 2015. All Rights Reserved.
• In past 12 months, zero Amazon EBS–
related failures
• Thousands of GP2 data volumes (~2PB
data)
• Transitioning all systems to Amazon EBS
root drives
• Moved all data stores to Amazon EBS
(C*, Kafka, Elasticsearch, Postgres, etc.)
CrowdStrike Today
![Page 68: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/68.jpg)
![Page 69: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/69.jpg)
© 2015. All Rights Reserved.
• Select a region with >2 AZs (e.g.,
us-east-1 or us-west-2)
• Use Amazon EBS GP2 or PIOPs storage
• Separate volumes for data and commit
logs
Staying Safe - Architecture
![Page 70: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/70.jpg)
© 2015. All Rights Reserved.
• Use Amazon EBS volume monitoring
• Pre-warm Amazon EBS volumes?
• Schedule snapshots for consistent backups
Staying Safe - Ops
![Page 71: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/71.jpg)
© 2015. All Rights Reserved.
• Challenge assumptions
• Stay current on AWS blog
• Talk with your peers
Most Importantly
http://aws.amazon.com/ebs/nosql/
![Page 72: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/72.jpg)
![Page 73: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/73.jpg)
Remember to complete
your evaluations!
BDT323
![Page 74: (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second](https://reader036.vdocuments.mx/reader036/viewer/2022081515/58a0a58b1a28ab9f758b6e0d/html5/thumbnails/74.jpg)
Thank you!
@jimplush@opacki