1 million writes per second on 60 nodes with cassandra and ebs

1 million writes per sec. on 60 nodes with Cassandra and EBS

1 Million Writes Per Second w/60 nodes. !

EBS and C*!Jim Plush - Sr Director of Engineering, CrowdStrike!

Dennis Opacki - Sr Cloud Systems Architect!

An Introduction to CrowdStrike

We Are CyberSecurity Technology Company

We Detect, Prevent And Respond To All Attack Types In Real Time, Protecting Organizations From Catastrophic Breaches

We Provide Next Generation Endpoint Protection, Threat Intelligence & Pre &Post IR Services

NEXT- GEN ENDPOINT

INCIDENTRESPONSE

THREATINTEL

http://www.crowdstrike.com/introduction-to-crowdstrike-falcon-host/

CrowdStrike Scale

•  Cloud based endpoint protection

•  Single customer can generate > 2TB daily

•  500K+ Events Per Second

•  Multi PetaBytes of managed data

Truisms???

•  HTTPs is too slow to run everywhere

•  All you need is anti-virus

•  Never run Cassandra on EBS

What is EBS?

EBS Data Volume

/mnt/foo

/mnt/bar

EC2 Instance

§ Network Mounted Hard Drive

§ Ability to snapshot data

§ Data encryption at rest & in flight

Existing EBS Assumptions

•  Jittery I/O aka: Noisy neighbors

•  Single Point of Failure in a Region

•  Cost is too damn high

A recent project: initial requirements

•  1PB of incoming event data from millions of devices

•  Modeled as a graph

•  1 million writes per second (burst)

•  Age data out after x days

•  95% write 5% read

We Tried

•  Cassandra + Titan

•  Sharding?

•  Neo4J

•  PostgreSQL, MySQL, SQLite

•  LevelDB/RocksDB

We have to make this work

•  Cassandra had the properties we needed •  Time for a new approach?

Number of Machines for 1PB

I2.xlarge c4.2XL EBS

Yearly Cost for 1PB Cluster

I2.xlarge-on demand I2.xlarge-reserved c4.2xl - on demand c4.2xl - reserved

With EBS

Initial Launch

Date Tiered Compaction

…more details by Jeff Jirsa, CrowdStrike

Cassandra Summit 2015 - DTCS

Initial Launch

•  Cassandra 2.0.12 (DSE)

•  m3.2xlarge 8 core

•  Single 4TB EBS GP2 ~10,000 IOPS

•  Default tunings

Performance was terrible

•  12 node cluster

•  ~60K writes per second RF2

•  ~10K writes per 8 core box

•  We went to the experts

Cassandra Summit 2014 Family Search asked the same question: Where’s the bottleneck?

https://www.youtube.com/watch?v=Qfzg7gcSK-g

IOPS Available

12500.

25000.

37500.

50000.

I2.xlarge c4.2xlarge

1.3K IOPS?

IOPS I see you there,

but I can’t reach you!

The magic gates opened…

We hit 1 million writes per second RF3 on 60 nodes

Testing Setup!

Testing Methodology

•  Each test run •  clean C* instances

•  old test keyspaces dropped •  13+TBs of data loaded during read testing •  20 C4.4XL Stress Writers each with their own 1BB sequence

Cluster Topology

Stress Node

10 Instances AZ: 1A

Stress Nodes

10 Instances AZ: 1B

20 C* Nodes AZ: 1A

20 C* Nodes AZ: 1B

20 C* Nodes AZ: 1C

OpsCenter

Cassandra Stress 2.1.x

bin/cassandra-stress user duration=100000m cl=ONE profile=/home/ubuntu/summit_stress.yaml ops$insert=1$ no-warmup -pop seq=1..1000000000 -mode native cql3 -node 10.10.10.XX -rate threads=1000 -errors ignore !

PCSTAT - Al Tobey

http://www.datastax.com/dev/blog/compaction-improvements-in-cassandra-21

https://github.com/tobert/pcstat

Netflix Test - What is C* capable of?

Netflix Test

1+ Million Writes Per second RF:3 3+ Million Local Writes Per second

Netflix Test

No Dropped Mutations, system healthy at 1.1M after 50 mins

Netflix Test

I/O Util is not pegged Commit Disk = Steady!

Netflix Test

Low IO Wait

Netflix Test

95th Latency = Reasonable

Netflix Test - Read Fail

compression={'chunk_length_kb': '64', 'sstable_compression': 'LZ4Compressor'}

https://issues.apache.org/jira/browse/CASSANDRA-10249 https://issues.apache.org/jira/browse/CASSANDRA-8894

Data Drive Pegged L

Reading Data

•  24 hour read test •  over 10TBs of data in the CF •  sustained > 350K reads per

second over 24 hours •  1M reads/per sec peak •  CL ONE •  12 C4.4XL stress boxes

Reading Data

Not Pegged J

Reading Data

7.2ms 95th latency

Netflix Test resource usage

•  180 Less Cores (45 less i2.xlarge instances) •  24 hour test (sans data transfer cost)

–  Netflix cluster/stress •  Cost: ~$6300 •  285 i2.xlarge $0.85 per hour

–  CrowdStrike cluster/stress with EBS cost •  Cost: ~$2600 •  60 C4.4XL $0.88 per hour

Read Notes with EBS

•  Our test was a single 10K IOPS volume •  More/Bigger Reads?

–  PIOPS gives you as much throughput as you need –  RAID0 multiple EBS volumes

/mnt/data

EBS Vol1 EBS Vol2

What Unlocked Performance!

Major Tweaks

•  Ubuntu HVM types •  Enhanced Networking •  now faster than PVM •  Ubuntu distro tuned for cloud workloads •  XFS Filesystem

Major Tweaks

•  Major Tweaks •  Cassandra 2.1

•  Java 8 •  G1 Garbage Collector - cassandra-env

https://issues.apache.org/jira/browse/CASSANDRA-7486

Major Tweaks

•  C4.4XL 16 core, EBS Optimized •  4TB, 10,000 IOPS EBS GP2 Encrypted Data Drive

–  160MB/s throughput

•  1TB 3000 IOPS EBS GP2 Encrypted Commit Log Drive

Major Tweaks

•  cassandra-env.sh •  MAX_HEAP_SIZE=8G •  JVM_OPTS=“$JVM_OPTS —XX:+UseG1GC” •  Lots of other minor tweaks

cassandra-env.sh

Put PID in batch mode

Mask CPU0 from the process to reduce context switching

Magic From Al Tobey

YAML Settings

•  cassandra.yaml (based on 16 core) •  concurrent_reads: 32 •  concurrent_writes: 64 •  memtable_flush_writers: 8 •  trickle_fsync: true •  trickle_fsync_interval_in_kb: 1000 •  native_transport_max_threads: 256 •  concurrent_compactors: 4

cassandra.yaml

We found a good portion of the CPU load was being used for internode compression which reduced write throughput

internode_compression: none

Lessons Learned

•  EBS was never the bottleneck during testing, GP2 is legit •  If you’re doing batching, write to the same rowkey in the batch •  Builtin types like list and map come at a performance penalty

•  30% hit on our writes using Map type •  DTCS is very young (see Jeff Jirsa’s talk) •  2.1 Stress Tool is tricky but great for modeling workloads •  How will compression affect your read path?

Test your own!

https://github.com/CrowdStrike/cassandra-tools

It’s just python

•  launch 20 nodes in us-east1 •  python launch.py launch --nodes=20 —config=c4-ebs-hvm

—az=us-east-1a •  bootstrap the new nodes with C*, RAID/Format disks, etc…

•  fab -u ubuntu bootstrapcass21:config=c4-highperf •  run arbitrary commands

•  fab -u ubuntu cmd:config=c4-highperf,cmd="sudo rm -rf /mnt/cassandra/data/summit_stress"

Run custom stress profiles… multi-node support

ubuntu@ip-10-10-10.XX:~$ python runstress.py --profile=stress10 —seednode=10.10.10.XX —-threads=50!!!Going to run: /home/ubuntu/apache-cassandra-2.1.5/tools/bin/cassandra-stress user duration=100000m cl=ONE profile=/home/ubuntu/summit_stress.yaml ops$insert=1,simple=9$ no-warmup -pop seq=1..1000000000 -mode native cql3 -node 10.10.10.XX -rate threads=50 -errors ignore !

ubuntu@ip-10-10-10.XX:~$ python runstress.py --profile=stress10 --seednode=10.10.10.XX --threads=50 !!Going to run: /home/ubuntu/apache-cassandra-2.1.5/tools/bin/cassandra-stress user duration=100000m cl=ONE profile=/home/ubuntu/summit_stress.yaml ops$insert=1,simple=9$ no-warmup -pop seq=1000000001..2000000000 -mode native cql3 -node 10.10.10.XX -rate threads=50 -errors ignore !

export NODENUM=1 !

export NODENUM=2 !

Where are we today?

•  ~3 months on our EBS based cluster •  Hundreds of TBs of graph data and growing in C* •  Billions of vertices/edges •  Changing perceptions?

Special thanks to

•  Leif Jackson •  Marcus King •  Alan Hannan •  Jeff Jirsa

•  Al Tobey •  Nick Panahi •  J.B. Langston •  Marcus Eriksson •  Iian Finlayson •  Dani Traphagen

EBS heading into 2016

4TB (10k IOPS) GP2

IO Hit? Not enough to phase C*

So why the hate for EBS?

Following the Crowd – Trust Issues

•  Used instance-‐store image and ephemeral drives

•  Painful to stop/start instances, resize •  Couldn’t avoid scheduled maintenance (i.e. Reboot-‐a-‐palooza)

•  EncrypUon required shenanigans

Guess What?

•  We sUll had failures •  Now we get to rebuild from scratch

What do you mean my volume is “stuck”? •  April 2011 – Ne[lix, Reddit and Quora •  October 2012 – Reddit, Imgur, Heroku •  August 2013 – Vine, AirBNB

EBS’s Troubled Childhood

h`p://techblog.ne[lix.com/2011/04/lessons-‐ne[lix-‐learned-‐from-‐aws-‐outage.html •  Spread services across mulUple regions •  Test failure scenarios regularly (Chaos Monkey) •  Make Cassandra databases more resilient by avoiding EBS

Kiss of Death

Amazon moves quickly and quietly: •  March 2011 – New EBS GM •  July 2012 – Provisioned IOPs •  May 2014 – NaUve EncrypUon •  Jun 2014 – GP2 (game changer) •  Mar 2015 – 16TB / 10K GP2/ 20K PIOPS

RedempUon

•  PrioriUzed EBS availability and consistency beyond features and funcUonality

•  Compartmentalized the control plane -‐ broke cross-‐AZ dependencies for running volumes

•  Simplified workflows to favor sustained operaUon •  Tested and simulated via TLA+/PlusCal -‐ be`er understood corner cases •  Dedicated a large fracUon of engineering resources to reliability and performance

RedempUon

Reliability

EBS Team targets 99.999% availability

exceeding expectaUons

Crowdstrike Today

In past 12 months, zero EBS-‐related failures •  Thousands of GP2 data volumes (~2PB data) •  TransiUoning all systems to EBS root drives •  Moved all data stores to EBS (C*, Kapa, ElasUcsearch, Postgres, etc)

Staying Safe -‐ Architecture

•  Select a region with >2 AZs (e.g us-‐east-‐1 or us-‐west-‐2)

•  Use EBS GP2 or PIOPs storage •  Separate volumes for data and commit logs

Staying Safe -‐ Ops

•  Use EBS volume monitoring •  Pre-‐warm EBS volumes? •  Schedule snapshots for consistent backups

Most Importantly

•  Challenge assumpUons •  Stay current on AWS blog •  Talk with your peers

Thank you @jimplush

@opacki

1 million writes per second on 60 nodes with cassandra and ebs

Software

cassandra summit 2014: cassandra compute cloud: an elastic...

ebs-9576v ebs-1310v eu

sopradoras e enchedoras - smigroup...ebs 3 hc ebs 4 hc ebs 6...

cassandra day nyc - cassandra anti patterns

ebs 20m / ebs 60m -...

2017년 11월 1일 1 15 제367 -...

apache cassandra at target - cassandra summit 2014

introduction to cassandra • why spark + cassandra ... ·...

quarterway ebs school climate matrix ebs visually...

oracle ebs: p2p with ebs payables and non-ebs procurement

cassandra community webinar: apache cassandra internals

(bdt323) amazon ebs & cassandra: 1 million writes per second

query or command - toruń jug...vert.x - asynchronous web...

apache cassandra™...

cassandra day atlanta 2015: python & cassandra

apache cassandra in action - o'reilly...

cassandra core concepts - cassandra day toronto

cassandra cluster management by japan cassandra community

chicago cassandra - cassandra from python

solr & cassandra: searching cassandra with datastax...