cassandra core concepts and design internals

Post on 16-Apr-2017

1.293 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Cassandra Core Concepts andDesign Internals

Cassandra Core Concepts and Design Internals

at

New Delhi Cassandra Users Meetup – November

2014

By: Salil Kalia

We’re going to talk about:

1. What is Cassandra?

2. High Level Architecture

3. Data Modeling

4. Write Path

5. Read Path

6. Tools

7. Q/A

A Database:

✓ Highly available

✓ Fully distributed, with no single point of failure

✓ Free & open source, with deep developer support

✓ Highly performing with near-linearhorizontal scaling

✓ Replicated & durable

What is Cassandra ?

Elastic Scalability

Distributed

Decentralized

FaultToleran

tColumn Oriented

TunableConsistenc

y

Highly available

KEY FEATURES

Open Source

Cassandra – Features

Google Big Table

Amazon Dynamo DB

[Facebook] Cassandra

Cassandra Evolution

✓ Ring based data distribution

✓ Only one type of Server

✓ Highly distributed

✓ All nodes hold data

✓ All nodes answer queries

✓ All nodes are replicas

✓ In-built Multi DC

✓ In-built Snitch feature

High Level Architecture

✓ Nodes and Virtual nodes

✓ Primary & Secondary range

✓ Partition Key (Hash)

✓ Partitioner

✓ Client & Coordinator

✓ Replication Factor (RF)

✓ Consistency Level (CL)

Few Common Terms

Magic Formula

Write CL + Read CL >RF

Immediate Consistency

Keyspace

Table

Partition

Row

Column

Data Modeling

✓ Like an RDBMS, Cassandra uses a Table to store data

✓ Partitions within tables

✓ Rows within partitions (or a single row)

✓ CQL to create tables & query data

✓ Partition keys determine where a partition is found

✓ Clustering keys determine ordering of rows within a partition

Data Modeling

name age occupation

Salil 32 Tech Lead

Vishal 25 Software Engineer

Akshay 45 Actor

Sheri 29 Singer

cqlsh:demo> create table user (name text primary key,age int, occupation text);

cqlsh:demo> select * from user WHERE name = ’Vishal'

Example: Single Row Partition

✓ User identified by name (PK)

✓ Single row per partition✓ RDBMS like structure

Video_id Comment_id Comment

5 1 Nice pic

5 2 Which place?

5 3 lol

6 4 Great!

cqlsh:demo> create tablecomment (video_id int, comment_id int, comment text, primary key ( video_id, comment_id));

cqlsh:demo> select * from comment WHERE Video_id=5;

Example: Multiple Rows Partition

• Video_id - partition key• comment_id – cluster key

* In real world, use UUIDs instead of int for PK

Query before data modeling

Denormalize the

data Create multiple views into your data

Cassandra is built for faster

writes Better – as few reads as possible

Data Modeling – Best practices

CommitLog – append only logs

Memtables – In memory table

SSTables – created after the data flushes to disk

Compaction – process to merge SSTables

Key components of the Write Path

✓ Memtables – In memory table✓ Row Cache – In memory cache stores recent read

rows✓ Bloom Filters – reports if a partition key may found in its corresponding SSTable

✓ Key Caches – in memory (on heap)

✓✓

Partition Summaries – in memory (on heap)

Partition Indexes – on disk

✓ SSTables – on disk

Key components of the Read Path

Contact us

Have more queries related to BIG DATA?

Talk To Our Experts!

Our Office

Client Location

Click Here To Know More!

Here’s how TOTHENEW helps your customers outsource across the globe using BIG DATA!

top related