apache cassandra architecture internals

16

Upload: bhuvan-rawal

Post on 16-Apr-2017

74 views

Category:

Data & Analytics


4 download

TRANSCRIPT

APACHE CASSANDRAArchitecture & Internals

BHUVAN RAWA L

SNAPDEAL .COM

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

NOSQL-DATABASE.ORG

> MASSIVELY SCALABLE

> PARTITIONED ROW STORE

> MASTERLESS ARCHITECTURE

> LINEAR SCALABILITY

> NO SINGLE POINT OF FAILURE

>  MULTIPLE DC SUPPORT OUT OF BOX

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

2008Open sourced by Facebook on Google Code, in

2009 became an Apache Incubator Project. In

2010 gained top level status at Apache.

Can be adapted for different

class of use cases

GENERAL PURPOSECan be available at the loss of

Node/Rack/DC

AVAILABLE

BHUVAN RAWAL

KEY FEATURES

CASSANDRA - AN OVERVIEW

Seamless distribution across

datacentres across continents

DISTRIBUTED

JVM Heap & GC Algorithms

Compaction Strategy

Key Cache Size

Row Cache

Compression Chunk Size

Speculative Retries

Throughput vs Latency tuning

KEY TUNABLES

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Cassandra is the most popular wide column

store - Wikipedia

Deployed by 400+ Fortune-500 Firms 

667 Companies Verified  on siftery

Apple 100,000+ Node Deployment

Netflix - 95% Data on Cassandra

Uber - 20 Cassandra Clusters, soon will be 100

Spotify - 100+ Production Clusters 

SOME USERS

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Determines how data is to be stored in

nodes

Should be same across the cluster

Ordered Partitioner

Random Partitioner

Murmur3 Partitioner

PARTITIONER

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Determines node placement

Allows to spread enough replicas to

handle failures

Failure Modes : Node -> Rack -> DC ->

Region

Tries its best to not have same replica in

same rack

SNITCH

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

status

health

tokens

schema version

data size

phi_threshold

GOSSIP PROTOCOL

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

As with most databases, data model is the key

to successful deployments & scalability

Test thoroughly on stage env

Avoid Client Side joins as far as possible

Materialized view - Boon for automated

denormalization

Tune Partition size to not affect cluster

abnormally

DATA MODEL

WWW.AUGUSTA&CO.COM

CASSANDRA - AN OVERVIEW

BHUVAN RAWAL

TEAM

Operations Manager

CASSANDRA - AN OVERVIEW

BHUVAN RAWAL

TEAM

CEO / Director

NANCY D. BROOKSHead Architect

RICHARD B. BEVERIDGEOperations Manager

JOHN V. POWELL

CASSANDRA - AN OVERVIEW

WWW.AUGUSTA&CO.COM

CASSANDRA - AN OVERVIEW

Datastax Driver for Spark:

-> Reads localized data off

Cassandra Nodes

-> Support for Hadoop

-> Pig, Hive, Squoop, Mahout

-> Solr integration

ANALYTICS SUPPORT

BHUVAN RAWA L

CASSANDRA - AN OVERVIEW

-> Memtable

-> SSTable - Sorted String

-> Index

-> Partition Summary

-> Bloom Filter

-> Compression

STORAGE

BHUVAN RAWAL

FELLOW DATASTORES

HBASE

RIAK MONGODB

AEROSPIKE BIGTABLE

SCYLLA

CASSANDRA - AN OVERVIEW

THANK YOU!  Bhuvan Rawal