performance tuning - a key to successful cassandra migration
TRANSCRIPT
Performance tuning - A key to successful Cassandra migration
© 2015. All Rights Reserved. 2
1.0 Abstract
2.0 Dominance of traditional RDBMS and Adoption of NoSQL
3.0 DataStax Cassandra – ‘The Visionary’
4.1 Our journey through Cassandra optimization : Data Model
4.2 Our journey through Cassandra optimization : Integration
4.3 Our journey through Cassandra optimization : DB Parameters
5.0 The only thing constant is change
6.0 Performance tuning - Key to success
© 2015. All Rights Reserved. 3
Abstract
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
© 2015. All Rights Reserved. 4
Dominance of RDBMS and NoSQL adoption
Storage of high volume data Transaction control Security management Common key concepts Evolved over a period Common construct for querying
Why don’t I try if these databases can offer more?
Support for clusters Cost Impedance mismatch Adaptability to newer workload
© 2015. All Rights Reserved. 5
DataStax Cassandra – ‘The Visionary’ …… As per Gartner’s Magic Quadrant, DataStax Cassandra is listed as a ‘Visionary’ Magic Quadrant clearly calls out the differentiating factors
High performance In-memory options Search capabilities Integration with Spark and Hadoop Experience in doing business with the vendor
Source: www.gartner.com
© 2015. All Rights Reserved. 6
…… But One of the major challenges listed in Gartner Magic Quadrant analysis is the
poor performance during POCs
Two major pit falls..
POCs are conducted as quick and dirty
No capacity planning
Performance Tuning
Moving to production without enough performance testing
© 2015. All Rights Reserved. 7
Don’t be in dark…
Have you tried out all possible tuning techniques before concluding the results ???...
Data model
Integration best practices
Database parameters
© 2015. All Rights Reserved. 8
Performance tuning - Key to success For a successful migration / implementation due diligence need to be done on all
different aspects
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
© 2015. All Rights Reserved. 9
Our journey through Cassandra optimization..
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
© 2015. All Rights Reserved. 10
Data model
Equal distribution of data across partitions
De-normalization
Redundancy of data is acceptable to cater to different read use cases
Reduce client side joins
Think out of the box (RDBMS) ! ! !
© 2015. All Rights Reserved. 11
Data model contd..
Limit secondary indexes
Do clustering based on the read pattern
CREATE TABLE cust_interaction (cust_id text, intr_id timeuuid, intr_tx text, PRIMARY KEY (cust_id, intr_id)) WITH CLUSTERING ORDER BY (intr_id DESC);
A table / CF that supports read for most
recent customer interactions
© 2015. All Rights Reserved. 12
Our journey through Cassandra optimization..
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
• ‘Batch’ statements• Consistency levels• Load balancing• Tombstones
Integration• Hidden data• Compaction• Cache
DB Parameters
© 2015. All Rights Reserved. 13
‘Batch’ is not for performance improvement Batching the statements can really harm the performance Use individual inserts wherever possible
N1
N2
N3
N4
N5
N6
N1
N2
N3
N4
N5
N6
Individual InsertsBatch Inserts
© 2015. All Rights Reserved. 14
Consistency levels
Decide consistency levels based on Workload Need for immediate consistency
Read Heavy Write Heavy Mixed work load
High Consistency (Immediate)
RC : ONEWC : All
RC : AllWC : ONE
RC : QuorumWC : Quorum
Relaxed consistency
RC : ONEWC : ONE, TWO
RC : ONE, TWOWC : ONE
RC : ONE, TWOWC : ONE, TWO
Considered RF = 3
© 2015. All Rights Reserved. 15
Load balancing strategy Consider topology Be aware of distribution of clients / users
TokenAwarePolicy acts as a wrapper With multiple data centers, most preferred approach is to go with
DCAwareRoundRobinPolicy with TokenAwarePolicy
In case of single data center installations, RoundRobinPolicy with TokenAwarePolicy can be considered
© 2015. All Rights Reserved. 16
Beware of Tombstones
Querying data which has columns with tombstone set can bring down the performance
Marker in a row indicates the delete Compaction removes the Tombstone based on GC Do not insert NULL to Cassandra IGNORE_NULLS to TRUE
Image Source: www.datastax.com
© 2015. All Rights Reserved. 17
Our journey through Cassandra optimization..
• Distribution• De-Normalization• Indexing• Query patterns
Data Model
© 2015. All Rights Reserved. 18
Watch for hidden data
TTL and gc_grace_seconds goes hand in hand Even after the data is deleted (tombstone is set), it still occupies the space till it passes gc_grace_seconds Direct impact on storage and performance Default GC is 10 days
Image Source: www.datastax.com
© 2015. All Rights Reserved. 19
Compaction
Size Tiered Compaction : Leveled Compaction : Date Tiered Compaction :
Full replacement is default
Incremental Replacement
Anti-compaction
Clients can read data directly from the new SSTable even before it finishes writing
Reduce Compaction I/O contention
Image Source: www.datastax.com
© 2015. All Rights Reserved. 20
Compaction Cont...
Default is Size-tiered Alter column family to change compaction type
Image Source: www.datastax.com
© 2015. All Rights Reserved. 21
Compaction Cont...
Handle Time series-like data
DateTiered Compaction Strategy
Image Source: www.datastax.com
© 2015. All Rights Reserved. 22
Cache what you need
Cassandra read path = A lot of in-memory components.. Be Optimal...
Image Source: https://academy.datastax.com/
Row cache hit
Row Cache – Turned OFF by default Caches the complete data
Earlier versions used to load the whole partition
From 2.1, number of rows cached per partition is configurable
Optimal for low volume data that are frequently accessed
© 2015. All Rights Reserved. 23
Cache what you need contd..
Image Source: https://academy.datastax.com/
Key cache hit
Key Cache – Turned ON by default Caches just the key
Turning OFF Increase the response time for retrieves
Place frequently and sparsely read data to different CF
No one configuration fits all. Tuning has to be iterative
© 2015. All Rights Reserved. 24
The only thing constant is change
2011 – 2012
- Secondary Indexes- Online schema changes- Introduction of CQL- Zero-downtime upgrade- Leveled compaction
2013 - 2014
- Virtual nodes- Inter-node communication- Light weight tnxs- Triggers- Change in data and log location- User defined data types
2015
- Commit log compression- JSON support- Role-based authorization- User defined functions- Windows support- Monthly versions
Keep up with the pace.. Changes can impact the performance a lot..
© 2015. All Rights Reserved. 25
Performance tuning - Key to success
DBADeveloper
Sys Admin
Traditional DBMS world NoSQL World
Database EngineerBoundary between different roles has blurred..
Onus is on ‘us’ to tune, tune and tune the system to make the Cassandra implementation successful.. !!!
© 2015. All Rights Reserved. 26
Question & Answers
???
© 2015. All Rights Reserved. 27
Authors
Tiju Francis, Principal Technology Architect, Infosys Ltd
https://www.linkedin.com/in/tijufrancis
Ramkumar Nottath, Technology Architect, Infosys Ltd
https://www.linkedin.com/in/ramnottath
Arunshankar Arjunan, Technology Architect, Infosys Ltd
https://www.linkedin.com/in/arunshankararjunan
© 2015. All Rights Reserved. 28
Thanks..
Thanks to all great minds who contributed towards this presentation. Srivas J, Infosys Ltd Srivas G, Infosys Ltd Lakshman G, Infosys Ltd Kiran N G Infosys Ltd Sivaram K Infosys Ltd Chethan Danivas, Infosys Ltd Badrinath Narayanan, Infosys Ltd Gautam Tiwari, Infosys Ltd Shailesh Janrao Barde , Infosys Ltd
© 2015. All Rights Reserved. 29
References NoSQL Distilled by Pramod J. Sadalage and Martin Fowler https://academy.datastax.com/courses http://www.gartner.com/ Mastering Apache Cassandra http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ http://www.planetcassandra.org/cassandra/ http://jonathanhui.com/cassandra-performance-tuning-and-monitoring
Source: www.gartner.com
Thank you