when every - meetupfiles.meetup.com/7139612/cassandra meetup amsterdam 20.07... · 2016-08-01 ·...

31

Upload: others

Post on 13-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204
Page 2: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

When every millisecond counts

July 2016

Matija [email protected]

@mad_max0204

Page 3: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Why this talk

We were challenged with an interesting requirement...

Page 4: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

What makes a distributed system?

A bunch of stuff that magically works together

Page 5: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

How to start?

Investigate the current setup (if any)

Understand your use case

Understand your data

Set a base configuration

Define the goal

Page 6: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Investigate the current setup

● What type of deployment are you working with?● What is the available hardware?

○ CPU cores and threads○ Memory amount and type○ Storage size and type○ Network interfaces amount and type○ Limitations

Page 7: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Hardware configuration

8-16 cores32GB ram

Commit log SSDData drive SSD

1GbE

Placement groupsAvailability zones

Enhanced networking

Page 8: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

OS - Swap, storage, cpu

Swap is bad

● remove swap from fstab● disable swap: swapoff -a

Optimize block layer

echo 1 > /sys/block/XXX/queue/nomergesecho 8 > /sys/block/XXX/queue/read_ahead_kbecho deadline > /sys/block/XXX/queue/scheduler

Disable cpu scaling

for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*do echo performance > $sysfs_cpu/cpufreq/scaling_governordone

Page 9: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

sysctl.d - network

net.ipv4.tcp_rmem = 4096 87380 16777216 # read buffer space allocatable in units of pagesnet.ipv4.tcp_wmem = 4096 65536 16777216 # write buffer space allocatable in units of pagesnet.ipv4.tcp_ecn = 0 # disable explicit congestion notificationnet.ipv4.tcp_window_scaling = 1 # enable window scaling (higher throughput)net.ipv4.ip_local_port_range = 10000 65535 # allowed local port rangenet.ipv4.tcp_tw_recycle = 1 # enable fast time-wait recycle

net.core.rmem_max = 16777216 # max socket receive buffer in bytesnet.core.wmem_max = 16777216 # max socket send buffer in bytesnet.core.somaxconn = 4096 # number of incoming connectionsnet.core.netdev_max_backlog = 16384 # incoming connections backlog

Page 10: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

sysctl.d - vm and fs

vm.swappiness = 1 # memory swapping thresholdvm.max_map_count = 1073741824 # max memory map areas a process can havevm.dirty_background_bytes = 10485760 # dirty memory amount threshold (kernel)vm.dirty_bytes = 1073741824 # dirty memory amount threshold (process)fs.file-max = 1073741824 # max number of open filesvm.min_free_kbytes = 1048576 # min number of VM free kilobytes

Page 11: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

JVM - G1GC

JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" # Set to number of full coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" # Set to number of full cores

Page 12: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

JVM - HotSpot

MAX_HEAP_SIZE="8G" # Good starting pointHEAP_NEWSIZE="2G" # Good starting point

JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"

# Tunable settingsJVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096"

# Instagram settingsJVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"

Page 13: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Cassandra yaml

concurrent_reads: 128concurrent_writes: 128concurrent_counter_writes: 128memtable_allocation_type: heap_buffersmemtable_flush_writers: 8memtable_cleanup_threshold: 0.15memtable_heap_space_in_mb: 2048memtable_offheap_space_in_mb: 2048

trickle_fsync: truetrickle_fsync_interval_in_kb: 1024

internode_compression: dc

Page 14: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Data model

Data model impacts performance a lotOptimize so that you read from one partition

Make sure your data can be distributedSSTable compression depending on the use case

Compaction strategy

Page 15: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Ok, what now?

After we set the base configuration it’s time for testing and observing

Page 16: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Test setup

Make sure you have repeatable testsFixed rate tests

Variable rate testsProduction like testsCassandra Stress

Various loadgen tools (gatling, wrk, loader,...)Coordinated omission

Page 17: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Tuning methodology

Page 18: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Metrics and reporting stack

OS metrics (SmartCat)Metrics reporter config (AddThis)

Cassandra diagnostics (SmartCat)FilebeatRiemannInfluxDBGrafana

ElasticsearchLogstashKibana

Page 19: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Grafana

Page 20: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Kibana

Page 21: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Slow queries

Track query execution times above some thresholdGain insights into the long processing queries

Relate that to what’s going on on the nodeCompare app and cluster slow queries

https://github.com/smartcat-labs/cassandra-diagnostics

Page 22: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Slow queries - cluster

Page 23: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Slow queries - cluster vs app

Page 24: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Ops center

Pros:Great when starting out

Everything you need in a nice GUICluster metrics

Cons:Metrics stored in the same cluster

Issues with some of the services (repair, slow query,...)Additional agents on the nodes

Page 25: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

AWS

Page 26: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

AWS deployment

Choose your instance based on calculationsCost limits come second

Use placement groups and availability zonesDon’t overdo it just because you can ($$$)

Go for EBS volumes (gp2)You don’t need ephemeral storage (mostly)

Page 27: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

EBS volumes

Pros:3.4TB+ volume has 10.000 IOPs

Average latency is ~0.38msDurable across reboots

AWS snapshotsCan be attached/detached

Easy to recreate

Cons:Rare latency spikes

Average latency is ~0.38msDegrading factor

Page 28: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

EBS volume problems

Page 29: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

End result

Did we meet our goal?Can we go any further?

Torture testingFailure scenarios

Latency and delay inducersAutomate everything

Page 30: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Q&A

Page 31: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204

Matija [email protected]

@mad_max0204

Thank you