cassandra core concepts and design internals
TRANSCRIPT
Cassandra Core Concepts andDesign Internals
Cassandra Core Concepts and Design Internals
at
New Delhi Cassandra Users Meetup – November
2014
By: Salil Kalia
We’re going to talk about:
1. What is Cassandra?
2. High Level Architecture
3. Data Modeling
4. Write Path
5. Read Path
6. Tools
7. Q/A
A Database:
✓ Highly available
✓ Fully distributed, with no single point of failure
✓ Free & open source, with deep developer support
✓ Highly performing with near-linearhorizontal scaling
✓ Replicated & durable
What is Cassandra ?
Elastic Scalability
Distributed
Decentralized
FaultToleran
tColumn Oriented
TunableConsistenc
y
Highly available
KEY FEATURES
Open Source
Cassandra – Features
Google Big Table
Amazon Dynamo DB
[Facebook] Cassandra
Cassandra Evolution
✓ Ring based data distribution
✓ Only one type of Server
✓ Highly distributed
✓ All nodes hold data
✓ All nodes answer queries
✓ All nodes are replicas
✓ In-built Multi DC
✓ In-built Snitch feature
High Level Architecture
✓ Nodes and Virtual nodes
✓ Primary & Secondary range
✓ Partition Key (Hash)
✓ Partitioner
✓ Client & Coordinator
✓ Replication Factor (RF)
✓ Consistency Level (CL)
Few Common Terms
Magic Formula
Write CL + Read CL >RF
Immediate Consistency
Keyspace
Table
Partition
Row
Column
Data Modeling
✓ Like an RDBMS, Cassandra uses a Table to store data
✓ Partitions within tables
✓ Rows within partitions (or a single row)
✓ CQL to create tables & query data
✓ Partition keys determine where a partition is found
✓ Clustering keys determine ordering of rows within a partition
Data Modeling
name age occupation
Salil 32 Tech Lead
Vishal 25 Software Engineer
Akshay 45 Actor
Sheri 29 Singer
cqlsh:demo> create table user (name text primary key,age int, occupation text);
cqlsh:demo> select * from user WHERE name = ’Vishal'
Example: Single Row Partition
✓ User identified by name (PK)
✓ Single row per partition✓ RDBMS like structure
Video_id Comment_id Comment
5 1 Nice pic
5 2 Which place?
5 3 lol
6 4 Great!
cqlsh:demo> create tablecomment (video_id int, comment_id int, comment text, primary key ( video_id, comment_id));
cqlsh:demo> select * from comment WHERE Video_id=5;
Example: Multiple Rows Partition
• Video_id - partition key• comment_id – cluster key
* In real world, use UUIDs instead of int for PK
Query before data modeling
Denormalize the
data Create multiple views into your data
Cassandra is built for faster
writes Better – as few reads as possible
Data Modeling – Best practices
CommitLog – append only logs
Memtables – In memory table
SSTables – created after the data flushes to disk
Compaction – process to merge SSTables
Key components of the Write Path
✓ Memtables – In memory table✓ Row Cache – In memory cache stores recent read
rows✓ Bloom Filters – reports if a partition key may found in its corresponding SSTable
✓ Key Caches – in memory (on heap)
✓✓
Partition Summaries – in memory (on heap)
Partition Indexes – on disk
✓ SSTables – on disk
Key components of the Read Path
Contact us
Have more queries related to BIG DATA?
Talk To Our Experts!
Our Office
Client Location
Click Here To Know More!
Here’s how TOTHENEW helps your customers outsource across the globe using BIG DATA!