Download - Standing Up Your First Cluster
![Page 1: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/1.jpg)
©2013 DataStax Confidential. Do not distribute without consent.
Jon Haddad, Technical Evangelist @rustyrazorblade
Standing up your first cluster
1
![Page 2: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/2.jpg)
First Step: Preparation
![Page 3: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/3.jpg)
Server Monitoring & Alerts•Monit • monitor processes • monitor disk usage • send alerts
•Munin / collectd • system perf statistics
•Nagios / Icinga • Various 3rd party services • OpsCenter - DSE Only • Use whatever works for
you
![Page 4: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/4.jpg)
Application Metrics• Statsd / Graphite • Grafana • Gather constant metrics from
your application •Measure anything & everything •Microtimers, counters • Graph events • user signup • error rates
• Cassandra Metrics Integration • jmxtrans
![Page 5: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/5.jpg)
Log Aggregation• Hosted - Splunk, Loggly • OSS - Logstash + Kibana, Greylog •Many more… • For best results all logs should be
aggregated here • Oh yeah, and log your errors.
![Page 6: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/6.jpg)
Get the Server Times Right• Everything is written with a timestamp • Last write wins • Usually supplied by coordinator • Can also be supplied by client •What if your timestamps are wrong
because your clocks are off? • Always install ntpd!
server time: 10
server time: 20
INSERTreal time: 12
DELETEreal time: 15
insert:20
delete:10
![Page 7: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/7.jpg)
Snitches• Snitch lets us distribute data in a fault tolerant way • Changing this with a large cluster is time
consuming • Dynamic Snitching • use the fastest replica for reads
• DC aware • GossipingPropertyFileSnitch (recommended)
![Page 8: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/8.jpg)
Replication Strategy• Don't use Simple for prod, you will
regret it •NetworkTopologyStrategy = win • Lets you pick replica count per DC
![Page 9: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/9.jpg)
Don't use Shared Storage• Single point of failure • High latency • Expensive • Performance is about latency • Can increase throughput with more
disks • In general avoid SAN, NAS
![Page 10: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/10.jpg)
Know your compaction!• Compaction merges SSTables • Size Tiered • Just merge stuff, get big
• Leveled: SSD • Read Heavy (More I/O)
• DateTiered / Timewindow • DT included with OSS • TW written by Jeff Jirsa • Both written to solve Time series • TW easier to configure • DT more options
![Page 11: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/11.jpg)
Utilities
![Page 12: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/12.jpg)
nodetool tpstats•What's blocked? •MemtableFlushWriter? - Slow
disks! • also leads to GC issues
• Dropped mutations? • need repair!
![Page 13: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/13.jpg)
Proxy Histograms• nodetool proxyhistograms • High level read and write times • Includes network latency
![Page 14: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/14.jpg)
Table Histograms• nodetool tablehistograms <keyspace> <table> • reports stats for single table on a single node • Used to identify tables with performance problems
![Page 15: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/15.jpg)
Query Tracing
![Page 16: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/16.jpg)
JVM Garbage Collection
![Page 17: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/17.jpg)
JVM GC Overview•What is garbage collection? • Manual vs automatic memory management
• Generational garbage collection (ParNew & CMS) • New Generation • Old Generation
![Page 18: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/18.jpg)
New Generation•New objects are created in the new gen (eden) • Comprised of Eden & 2 survivor spaces (SurvivorRatio) • Space identified by HEAP_NEWSIZE in cassandra-env.sh • Historically limited to 800MB
![Page 19: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/19.jpg)
Minor GC• Occurs when Eden fills up • Stop the world • Dead objects are removed • Copy current survivor to empty survivor • Live objects are promoted into survivor (S0 & S1) then old gen • Some survivor objects promoted to old gen (MaxTenuringThreshold) • Spillover promoted to old gen • Removing objects is fast, promoting objects is slow
![Page 20: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/20.jpg)
Old Generation• Objects are promoted to new gen from old gen •Major GC • Mostly concurrent • 2 short stop the world pauses
![Page 21: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/21.jpg)
Full GC• Occurs when old gen fills up or
objects can’t be promoted • Stop the world • Collects all generations • Defragments old gen • These are bad! •Massive pauses
![Page 22: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/22.jpg)
Workload 1: Write Heavy• Objects promoted: Memtables •New gen too big • Remember: promoting objects is slow! • Huge new gen = potentially a lot of promotion
new gen old gen
too much promotion
![Page 23: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/23.jpg)
Workload 2: Read Heavy• Short lived objects being promoted into old gen • Lots of minor GCs • Read heavy workloads on SSD • Results in frequent full GC
new gen old gen (full of short lived objects)
early promotion
fills up quickly
![Page 24: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/24.jpg)
G1GC• Improvement over ParNew+CMS • Hard to tune • CASSANDRA-8150
• G1 has more predictable pauses • Better latency •Many new gen, many old gen • G1 is adaptive to usage
E SO
SO E
O S
EE
Eden Old GenS0 S1
![Page 25: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/25.jpg)
GC Profiling• Opscenter gc stats • Look for correlations between gc spikes
and read/write latency
• Cassandra GC Logging • Can be activated in cassandra-env.sh
• jstat • prints gc activity
![Page 26: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/26.jpg)
How much does it matter?
![Page 27: Standing Up Your First Cluster](https://reader034.vdocuments.mx/reader034/viewer/2022042907/586f762d1a28ab10258b633f/html5/thumbnails/27.jpg)
©2013 DataStax Confidential. Do not distribute without consent. 27