cassandra metrics

78
Monitor Everything By: Chris Lohfink Cassandra Summit 2014

Upload: chris-lohfink

Post on 14-Dec-2014

548 views

Category:

Technology


5 download

DESCRIPTION

Presentation on metrics for the 2014 Cassandra Summit

TRANSCRIPT

Page 1: Cassandra Metrics

Monitor EverythingBy: Chris

Lohfink

Cassandra Summit 2014

Page 2: Cassandra Metrics

About Me

#CassandraSummit 2014

● Sr. Engineer at Pythiano Lead of Cassandra Practice

● Remote in Minnesota● DataStax MVP for Cassandra ‘14● Interests

o Java, Clojure, Python devo Data scienceo Hobbyist electronics

Page 3: Cassandra Metrics

About Pythian

#CassandraSummit 2014

Pythian is a global data outsourcing and consulting company that specializes in optimizing and managing mission-critical data systems.

Pythian blends the world’s leading data experts with advanced, secure service delivery processes to create the industry’s best standard of care for its clients.

Since its inception, Pythian has managed some of the world’s largest, most business-critical data infrastructures.

10,000Pythian currently manages more than 10,000

systems.

350Pythian currently employs more than 350 people

in 25 countries worldwide.

1997Pythian was founded in 1997

Page 4: Cassandra Metrics

About Cassandra

#CassandraSummit 2014

● No Single Point of Failure

● Fault Tolerant

● Awesome properties for an operations team who does not want to get up at 3am

Page 5: Cassandra Metrics

About Cassandra

#CassandraSummit 2014

● Nothing should be set up and forgotten about● Easy to do with Cassandra though

o Fault tolerance on properly configured setup handles single node being down or having temp performance issues

o No back pressure on writes until there is a lot of trouble

Page 6: Cassandra Metrics

Utilize the fault tolerance buffer

#CassandraSummit 2014

● Need to observe and react to current issues● Predict future issues● Divide this into two approaches

o Proactiveo Reactive

Page 7: Cassandra Metrics

Proactive

#CassandraSummit 2014

● Daily & Weekly checkups to prevent, and predict problemso Capacityo Performance bottleneckso Data Modeling issues

Page 8: Cassandra Metrics

Reactive

#CassandraSummit 2014

● Something about best laid plans…o Hardware failureso Bugso Malicious or Non-Malicious users

● Alarms, Pager Duty

Page 9: Cassandra Metrics

Common element

#CassandraSummit 2014

● Data is neededo form alertso find anomalieso trendingo debugging

Page 10: Cassandra Metrics

Metrics

#CassandraSummit 2014

● Window to the applicationo Bridge the gap - Coda Hale

Page 11: Cassandra Metrics

Cassandra Environment

OpsCenter Logs

JMX CPU, Disk, Network

Nodetool JVM, GC

SOURCES

Gathering Metrics

#CassandraSummit 2014

Page 12: Cassandra Metrics

Metricsbut of course…

Without context, the data is just pretty graphs

Page 13: Cassandra Metrics

JMX

#CassandraSummit 2014

● Java Management Extensions● Complex… very engineered● Resources represented as objects with

attributes and operations● Used for monitoring or as input

Page 14: Cassandra Metrics

● The annoying gateway to metrics ○ Poor tooling - requires java○ Slow, Memory Leaks○ Historically and currently frustrating for ops (pre 2.0.8)

JMX

#CassandraSummit 2014

1024-65535

Init connection to port 7199 Reply with hostname:port

for RMI connection

Client (You)

Gets new hostname:port, drops old connection and attempts to connect

7199

7199

Connected!

Cassandra

Page 15: Cassandra Metrics

JMX

#CassandraSummit 2014

● Visualo jconsoleo visualvm

● Command lineo jmxtermo jmxsh

● MX4J● Jolokia

Page 16: Cassandra Metrics

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]...

Page 17: Cassandra Metrics

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]...

com.pythian:site=blog,type=views,target=post1

Page 18: Cassandra Metrics

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]...

com.pythian:site=blog,type=views,target=post1

Page 19: Cassandra Metrics

JMX

#CassandraSummit 2014

[domain]:[key]=[value],[key2]=[value2]...

com.pythian:site=blog,type=views,target=post1

Page 20: Cassandra Metrics

JMX Domains

#CassandraSummit 2014

org.apache.cassandra.● db● internal● net● request

Page 21: Cassandra Metrics

org.apache.cassandra.metrics● db● internal● net● request

JMX Beans

#CassandraSummit 2014

Page 22: Cassandra Metrics

JMX

#CassandraSummit 2014

org.apache.cassandra.metrics :type=

● Cache● Client● ClientRequest● ClientRequestMetrics● ColumnFamily● CommitLog● Compaction

● DroppedMessage● FileCache● Keyspace● Storage● ThreadPools

Page 23: Cassandra Metrics

JMX

#CassandraSummit 2014

org.apache.cassandra.metrics

type=*, scope=*, name=*,

type=ThreadPools, path=*, scope=*, name=*,

type=ColumnFamily, keyspace=*, scope=*, name=*,

type=Keyspace, keyspace=*, name=*,

Page 24: Cassandra Metrics

Metrics

#CassandraSummit 2014

● Toolkit called metrics for metricso By Coda Hale @ Yammer

● Easy to use● Easy to read (if you know java)● Popular

Page 25: Cassandra Metrics

Types of Metrics

#CassandraSummit 2014

● Gaugeo instantaneous value

● Countero number that can be incremented & decremented

● Metero rate of events over time (1/5/15 min moving avg)

● Histogramo representation of statistical distribution

50, 75, 95, 98, 99, 99.9 percentile average, median, min, max, standard deviation

● Timero rate of events (meter)o histogram of duration

Page 26: Cassandra Metrics

JMX

#CassandraSummit 2014

75th percentile is 683 MICROSECONDS (75% took 683us or less)

One minute rate is 13,915 calls per SECOND

Page 27: Cassandra Metrics

JMX

#CassandraSummit 2014

● Overwhelming at first● Hard to tell what they mean without the source● Moves around a lot● Fortunately there is nodetool

Page 28: Cassandra Metrics

Nodetool

#CassandraSummit 2014

● JMX command line wrapper● Many options● Operations and diagnostic procedures● For reactive analysis

o ad hoc, spot checks

Page 29: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 30: Cassandra Metrics

Staged Event Driven Architecture

#CassandraSummit 2014

● Decomposes complex event system● Set of stages (thread pools)● Queue between each● Shares a lot of pros cons as SOA

Page 31: Cassandra Metrics

Staged Event Driven Architecture

#CassandraSummit 2014

Threads

ReadStage

x32

Clie

nt R

equ

est RequestResponse

Threads

ReadRepairStage

Threads

MessagingService

Node 2

Node 1 Node 1

Nod

e 1

= Task

Page 32: Cassandra Metrics

Staged Event Driven Architecture

#CassandraSummit 2014

● Possible to overrun the processing capabilities of a stage that is not in the requests feedback loop (i.e. ReadRepairStage)

Page 33: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 34: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 35: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 36: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 37: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 38: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Page 39: Cassandra Metrics

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 0COUNTER_MUTATION 0

Nodetool tpstats

#CassandraSummit 2014

RequestResponse

Threads

Page 40: Cassandra Metrics

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}Pool Name Active Pending Completed Blocked All time blocked

ReadStage 0 0 113702 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 0 164503 0 0...InternalResponseStage 0 0 0 0 0HintedHandoff 0 0 0 0 0

Message type DroppedRANGE_SLICE 0READ_REPAIR 0...REQUEST_RESPONSE 1COUNTER_MUTATION 0

Nodetool tpstats

#CassandraSummit 2014

RequestResponse

Threads

Page 41: Cassandra Metrics

Nodetool tpstats

#CassandraSummit 2014

nodetool tpstatsorg.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name= {ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}

More at:http://www.evidencebasedit.com/guide-to-cassandra-thread-pools

Page 42: Cassandra Metrics

Nodetool cfhistograms

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

SSTables per Read1 sstables: 985542 sstables: 4534

Write Latency (microseconds)No Data

Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559

Page 43: Cassandra Metrics

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 44: Cassandra Metrics

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 45: Cassandra Metrics

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 46: Cassandra Metrics

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 47: Cassandra Metrics

Read Write Path mile high overview

#CassandraSummit 2014

Memtable SSTable

Writes Reads

Page 48: Cassandra Metrics

Nodetool cfhistograms

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

SSTables per Read1 sstables: 985542 sstables: 4534

Write Latency (microseconds)No Data

Read Latency (microseconds) 10 us: 2 12 us: 17 14 us: 96 17 us: 208 20 us: 677 24 us: 3081 29 us: 4552 35 us: 3559

Page 49: Cassandra Metrics

Nodetool cfhistograms 1.1

#CassandraSummit 2014

Offset SSTables Write Latency Read Latency Row Size Column Count1 3579 0 0 0 02 0 0 0 0 0. . .35 0 0 0 0 042 0 0 27 0 050 0 0 187 0 060 0 10 460 0 072 0 200 689 0 086 0 663 552 0 0103 0 796 367 0 0124 0 297 736 0 0149 0 265 243 0 0179 0 460 263 0 0. . .25109160 0 0 0 0 0

nodetool cfhistograms {keyspace} {table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Page 50: Cassandra Metrics

Nodetool cfhistograms

#CassandraSummit 2014

https://gist.github.com/clohfink/6068003

Page 51: Cassandra Metrics

Nodetool cfhistograms 2.1

#CassandraSummit 2014

nodetool cfhistograms {keyspace} {table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace/Table histogramsPercentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 1.00 10.00 524.00 310 575% 1.00 11.75 888.00 310 595% 1.00 15.00 4843.75 310 598% 1.00 17.00 9658.90 310 599% 1.00 19.00 12306.47 310 5Min 0.00 0.00 68.00 30 0Max 2.00 219386.00 45383.00 310 5

Page 52: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 53: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 54: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 55: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 56: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 SSTables in each level: [14/4, 1, 0, …, 0] Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 57: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 58: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 59: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 60: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 61: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 62: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 63: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 64: Cassandra Metrics

Nodetool cfstats

#CassandraSummit 2014

nodetool cfstats {-i} {keyspace}.{table}org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}

Keyspace: Keyspace1 Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Tasks: 0 Table: Standard1 SSTable count: 3 Space used (live), bytes: 9088955 Space used (total), bytes: 9088955 Space used by snapshots (total), bytes: 0 SSTable Compression Ratio: 0.3672150946 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 3

Local read count: 11207Local read latency: 0.048 msLocal write count: 17598Local write latency: 0.054 msPending tasks: 0Bloom filter false positives: 0Bloom filter false ratio: 0.00000Bloom filter space used, bytes: 11688Compacted partition minimum bytes: 1110Compacted partition maximum bytes: 126934Compacted partition mean bytes: 2730Average live cells per slice: 0.0Average tombstones per slice: 0.0

Page 65: Cassandra Metrics

Nodetool proxyhistograms

#CassandraSummit 2014

nodetool proxyhistogramsorg.apache.cassandra.metrics:type=ClientRequest,scope={Read|Write|RangeSlice},name=Latency

$ nodetool proxyhistogramsproxy histograms

Read Latency (microseconds)61214 us: 1

Write Latency (microseconds) 103 us: 22 124 us: 142 149 us: 297 179 us: 1190 215 us: 1823 258 us: 2091

...

Page 66: Cassandra Metrics

Nodetool

#CassandraSummit 2014

Much more!!

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

Page 67: Cassandra Metrics

Reporting Interface

#CassandraSummit 2014

Default Addons Community

JMX Ganglia Cassandra StatsD NewRelic Splunk

Console

Graphite

Cloudwatch Kafka Riemann TempDB

Csv Munin Riak InfluxDB Sematext

Slf4j MongoDB OpenTSDB Librato … MORE

Page 68: Cassandra Metrics

Reporting Interface

#CassandraSummit 2014

● Configurable with yamlo console, csv, ganglia, graphite, riemann

● Create reporter with premain agento compiling new jar with manifesto add to classpatho add javaagent in cassandra-env.sh

Page 69: Cassandra Metrics

Garbage Collection

#CassandraSummit 2014

● Death, Taxes, and a stop the world GC● Common issue to all JVM based applications

Page 70: Cassandra Metrics

Garbage Collection

#CassandraSummit 2014

Enable gc logging● Virtually no overhead● Can be very helpful in diagnosing

performance issues

Page 71: Cassandra Metrics

Garbage Collection

#CassandraSummit 2014

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"

JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"

Page 72: Cassandra Metrics

Garbage Collection

#CassandraSummit 2014

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"

JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log" JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log" JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation" JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10" JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"

Page 73: Cassandra Metrics

Garbage Collection

#CassandraSummit 2014

Could be its own talk

Honorable mentions:● https://github.com/chewiebug/GCViewer● http://jworks.idv.tw/GcWeb/ ● Python, R, Octave

Page 74: Cassandra Metrics

Logging

#CassandraSummit 2014

/var/log/cassandra/system.log o provides a rolling logo log4j

/var/log/cassandra/output.log o captured standard error and standard outo truncated on restart

System Logso syslog, dmesg, etc

Page 75: Cassandra Metrics

OS Metrics

#CassandraSummit 2014

Shout-out:

http://www.brendangregg.com/linuxperf.html

Page 76: Cassandra Metrics

JVM

#CassandraSummit 2014

● Heapo GC logso JMX

● Threadso jvmtopo Jstack (+htop)o kill -3o JMX

Page 77: Cassandra Metrics

And Everything

#CassandraSummit 2014

Page 78: Cassandra Metrics

Questions

#CassandraSummit 2014

?