Performance tuning - A key to successful Cassandra migration
1.0 Abstract
2.0 Dominance of traditional RDBMS and Adoption of NoSQL
3.0 DataStax Cassandra – ‘The Visionary’
4.1 Our journey through Cassandra optimization : Data Model
4.2 Our journey through Cassandra optimization : Integration
4.3 Our journey through Cassandra optimization : DB Parameters
5.0 The only thing constant is change
6.0 Performance tuning - Key to success2© 2015. All Rights Reserved.
Abstract
3© 2015. All Rights Reserved.
In last few years, technology has seen a major drift in the dominance of traditional / RDMBSdatabases across different domains. Expeditious adoption of NoSQL databases especiallyCassandra in the industry opens up a lot more discussions on what are the major challenges thatare faced during implementation of Cassandra and how to mitigate it. Many a times we concludethat migration or POC (proof of concept) is not successful;; however the real flaw might be in the datamodeling, identifying the right hardware configurations, database parameters, right consistency leveland so on. There's no one good model or configuration which fits all use cases and all applications.Performance tuning an application is truly an art and requires perseverance. This paper delve intodifferent performance tuning considerations and anti-patterns that need to be considered duringCassandra migration / implementation to make sure we are able to reap the benefits of Cassandra,what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational DatabaseManagement Systems.
Dominance of RDBMS and NoSQL adoption
4© 2015. All Rights Reserved.
Ø Storage of high volume dataØ Transaction controlØ Security managementØ Common key conceptsØ Evolved over a periodØ Common construct for querying
Why don’t I try if these databases can offer more?
Ø Support for clustersØ CostØ Impedance mismatchØ Adaptability to newer workload
DataStax Cassandra – ‘The Visionary’ ……
5© 2015. All Rights Reserved.
Ø As per Gartner’s Magic Quadrant, DataStax Cassandra is listed as a ‘Visionary’Ø Magic Quadrant clearly calls out the differentiating factors
ü High performanceü In-memory optionsü Search capabilitiesü Integration with Spark and Hadoopü Experience in doing business withthe vendor
Source: www.gartner.com
…… But
6© 2015. All Rights Reserved.
Ø One of the major challenges listed in Gartner Magic Quadrant analysis is thepoor performance during POCs
Two major pit falls..
Ø POCs are conducted as quick and dirty
ü No capacity planning
ü Performance Tuning
Ø Moving to production without enough performance testing
Don’t be in dark…
7© 2015. All Rights Reserved.
Have you tried out all possible tuning techniques before concluding the results ???...
ü Data model
ü Integration best practices
ü Database parameters
Performance tuning - Key to success
8© 2015. All Rights Reserved.
Ø For a successful migration / implementation due diligence need to be done on alldifferent aspects
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
Our journey through Cassandra optimization..
9© 2015. All Rights Reserved.
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
Data model
10© 2015. All Rights Reserved.
Ø Equal distribution of data across partitions
Ø De-normalization
Ø Redundancy of data is acceptable to cater to different read use cases
Ø Reduce client side joins
Think out of the box (RDBMS) ! ! !
Data model contd..
11© 2015. All Rights Reserved.
Ø Limit secondary indexes
Ø Do clustering based on the readpattern
CREATE TABLE cust_interaction (cust_id text, intr_id timeuuid, intr_tx text, PRIMARY KEY (cust_id, intr_id)) WITH CLUSTERING ORDER BY (intr_id DESC);
A table / CF that supports read for most recent customer interactions
Our journey through Cassandra optimization..
12© 2015. All Rights Reserved.
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
‘Batch’ is not for performance improvement
13© 2015. All Rights Reserved.
Ø Batching the statements can really harm the performanceØ Use individual inserts wherever possible
N1
N2
N3
N4
N5
N6
N1
N2
N3
N4
N5
N6
Individual InsertsBatch Inserts
Consistency levels
14© 2015. All Rights Reserved.
Ø Decide consistency levels based onü Workloadü Need for immediate consistency
Read Heavy Write Heavy Mixed work loadHigh Consistency (Immediate)
RC : ONEWC : All
RC : AllWC : ONE
RC : QuorumWC : Quorum
Relaxed consistency
RC : ONEWC : ONE, TWO
RC : ONE, TWOWC : ONE
RC : ONE, TWOWC : ONE, TWO
Considered RF = 3
Load balancing strategy
15© 2015. All Rights Reserved.
Ø Consider topologyØ Be aware of distribution of clients / users
ü TokenAwarePolicy acts as a wrapperü With multiple data centers, most preferred approach is to gowith DCAwareRoundRobinPolicy with TokenAwarePolicy
ü In case of single data center installations, RoundRobinPolicywith TokenAwarePolicy can be considered
Beware of Tombstones
16© 2015. All Rights Reserved.
Ø Querying data which has columns with tombstone set can bring down the performance
Ø Marker in a row indicates the deleteØ Compaction removes the Tombstone based on GCØ Do not insert NULL to CassandraØ IGNORE_NULLS to TRUE
Image Source: www.datastax.com
Our journey through Cassandra optimization..
17© 2015. All Rights Reserved.
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
Watch for hidden data
18© 2015. All Rights Reserved.
Ø TTL and gc_grace_seconds goes hand in handØ Even after the data is deleted (tombstone is set), it still occupies the spacetill it passes gc_grace_seconds
Ø Direct impact on storage and performanceØ Default GC is 10 days
Image Source: www.datastax.com
Compaction
19© 2015. All Rights Reserved.
Ø Size Tiered Compaction :Ø Leveled Compaction :Ø Date Tiered Compaction :
Ø Full replacement is default
Ø Incremental Replacement
Ø Anti-compaction
Ø Clients can read data directly from the new SSTable even before it finishes writing
Ø Reduce Compaction I/O contention
Image Source: www.datastax.com
Compaction Cont...
20© 2015. All Rights Reserved.
Ø Default is Size-tieredØ Alter column family to change compaction type
Image Source: www.datastax.com
Compaction Cont...
21© 2015. All Rights Reserved.
Ø Handle Time series-like data
DateTiered Compaction Strategy
Image Source: www.datastax.com
Cache what you need
22© 2015. All Rights Reserved.
Cassandra read path = A lot of in-memory components.. Be Optimal...
Image Source: https://academy.datastax .com/
Row cache hit
Ø Row Cache – Turned OFF by defaultü Caches the complete data
ü Earlier versions used to load thewhole partition
ü From 2.1, number of rows cached per partition is configurable
ü Optimal for low volume data that are frequently accessed
Cache what you need contd..
23© 2015. All Rights Reserved. Image Source: https://academy.datastax .com/
Key cache hit
Ø Key Cache – Turned ON by defaultü Caches just the key
ü Turning OFF à Increase the response time for retrieves
ü Place frequently and sparsely read data to different CF
No one configuration fits all. Tuning has to be iterative
The only thing constant is change
24© 2015. All Rights Reserved.
2011 –2012
- Secondary Indexes- Online schema changes
- Introduction of CQL- Zero-downtime upgrade- Leveled compaction 20
13 -2014
- Virtual nodes- Inter-node communication- Light weight tnxs- Triggers- Change in data and log location- User defined data types
2015
- Commit log compression- JSON support- Role-based authorization- User defined functions- Windows support- Monthly versions
Keep up with the pace.. Changes can impact the performance a lot..
Performance tuning - Key to success
25© 2015. All Rights Reserved.
DBADeveloper
Sys Admin
Traditional DBMS world NoSQL World
Database EngineerBoundary between different roles has blurred..
Onus is on ‘us’ to tune, tune and tune the system to make the Cassandra implementation successful.. !!!
Question & Answers
26© 2015. All Rights Reserved.
???
Authors
27© 2015. All Rights Reserved.
Tiju Francis, Principal Technology Architect, Infosys Ltd
https://www.linkedin.com/in/tijufrancis
Ramkumar Nottath, Technology Architect, Infosys Ltd
https://www.linkedin.com/in/ramnottath
Arunshankar Arjunan, Technology Architect, Infosys Ltdhttps://www.linkedin.com/in/arunshankararjunan
Thanks..
28© 2015. All Rights Reserved.
Ø Thanks to all great minds who contributed towards this presentation.ü Srivas J, Infosys Ltdü Srivas G, Infosys Ltdü Lakshman G, Infosys Ltdü Kiran N G Infosys Ltdü Sivaram K Infosys Ltdü Chethan Danivas, Infosys Ltdü Badrinath Narayanan, Infosys Ltdü Gautam Tiwari, Infosys Ltdü Shailesh Janrao Barde , Infosys Ltd
References
29© 2015. All Rights Reserved.
Ø NoSQL Distilled by Pramod J. Sadalage and Martin FowlerØ https://academy.datastax.com/coursesØ http://www.gartner.com/Ø Mastering Apache CassandraØ http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/Ø http://www.planetcassandra.org/cassandra/Ø http://jonathanhui.com/cassandra-performance-tuning-and-monitoring
Source: www.gartner.com
Thank you