cassandra overview

26
Overview of Cassandra

Upload: sean-murphy

Post on 11-May-2015

4.800 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Cassandra overview

Overview of Cassandra

Page 2: Cassandra overview

Outline● History/motivation● Semi structured data in Cassandra

○ CFs and SuperCFs● Architecture of Cassandra system

○ Distribution of content○ Replication of content○ Consistency level○ Node internals○ Gossip

● Thrift API● Design patterns - denormalization

Page 3: Cassandra overview

History/motivation● Initially developed by facebook for Inbox

Search○ in late 2007/early 2008

● Designed for○ node failure - commodity hardware○ scale - can increase number of nodes easily to

accommodate increasing demand○ fast write access while delivering good read

performance● Combination of Bigtable and Dynamo● Was operational for over 2 years

○ Dropped in favour of HBase

Page 4: Cassandra overview

History/motivation● Released as open source in July 2008● Apache liked it

○ Became Apache Incubator project in March 2009○ Became Apache top level project in Feb 2010

● Active project with releases every few months○ currently on version 1.1

■ production ready, but still evolving

Page 5: Cassandra overview

Why it's interesting (in this context)...● Has seen significant growth in last couple of

years● Enough deployments to be credible

○ Netflix, Ooyala, Digg, Cisco, ● Is scalable and robust enough for big data

problems○ no single point of failure

● Complex system○ perhaps excessively complex today

Page 6: Cassandra overview

Cassandra - semi structured data ● Column based database

○ has similarities to standard RDBMS● Terminology:

○ Keystore -> database○ ColumnFamily -> table

Page 7: Cassandra overview

Cassandra - semi structured data● No specific schema is required

○ although it is possible to define schema■ can include typing information for parts of

schema to minimize data integrity problems● Rows can have large numbers of columns

○ limit on number of columns is 2B● Column values should not exceed some MB● SuperColumns are columns embedded

within columns○ third level in a map○ little discussion of SC here

Page 8: Cassandra overview

Supercolumns depicted

Page 9: Cassandra overview

Cassandra - secondary indexing● Columns can be indexed

○ so-called 'secondary indexing'■ row keys form the primary index

● Some debate abt the merits of secondary indexing in cassandra○ secondary indexing is an atomic operation

■ unlike alternative 'manual' indexing approach○ causes change in thinking regarding NoSQL design

■ very similar to classical RDBMS thinking

Page 10: Cassandra overview

Cassandra Architecture● Cluster configuration typical● All nodes peers

○ although there are some seeds which should be more reliable, larger nodes

● Peers have common view of tokenspace○ tokenspace is a ring

■ of size 2^127○ peers have responsibility for some part of ring

■ ie some range of tokens within ring● Row key/keyspace mapped to token

○ used to determine which node is responsible for row data■ all row data kept together and stored in node

Page 11: Cassandra overview

Cassandra - Cluster and Tokenspace

Page 12: Cassandra overview

Cassandra - Data Distribution● Map from RowKey to token determines data

distribution● RandomPartitioner is most important map

○ generates MD5 hash of rowkey○ distributes data evenly over nodes in cluster○ highly preferred solution○ constraint that it is not possible to iterate over rows

● OrderedPartitioner○ generates token based on simply byte mapping of

row key○ most probably results in uneven distribution of data○ can be used to iterate over rows

Page 13: Cassandra overview

Cassandra - Data Replication● Multiple levels of replication supported

○ can support arbitrary level of replication○ replication factors specified per keyspace

● Two replication strategies○ RackUnaware

■ Make replicas in next n nodes along token ring○ RackAware

■ Makes one replica in remote data centre■ Make remaining replicas in next nodes along

token ring● good ring configuration should result in diversity over data

centres

Page 14: Cassandra overview

Cassandra - Consistency Level● A mechanism to trade off latency with data

consistency○ Write case:

■ Faster response <-> less sure data written properly

○ Read case:■ Faster response <-> less sure most recent data

read● Related to data replication above

○ replication factor determines meaningful levels for consistency level

Page 15: Cassandra overview

Cassandra - Consistency Level - Write

Level Behavior

ANY Ensure that the write has been written to at least 1 node, including HintedHandoff recipients.

ONE Ensure that the write has been written to at least 1 replica's commit log and memory table before responding to the client.

TWO Ensure that the write has been written to at least 2 replica's before responding to the client.

THREE Ensure that the write has been written to at least 3 replica's before responding to the client.

QUORUM Ensure that the write has been written to N / 2 + 1 replicas before responding to the client.

LOCAL_QUORUM

Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within the local datacenter (requires NetworkTopologyStrategy)

EACH_QUORUM

Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenter (requires NetworkTopologyStrategy)

ALL

Ensure that the write is written to all N replicas before responding to the client. Any unresponsive replicas will fail the operation.

Page 16: Cassandra overview

Cassandra - Consistency Level - Read

Level Behavior

ANY Not supported. You probably want ONE instead.

ONE

Will return the record returned by the first replica to respond. A consistency check is always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is calledReadRepair)

TWO

Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will be checked in the background.

THREE Will query 3 replicas and return the record with the most recent timestamp.

QUORUM

Will query all replicas and return the record with the most recent timestamp once it has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.

LOCAL_QUORUM

Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have replied.

EACH_QUORUM

Returns the record with the most recent timestamp once a majority of replicas within each datacenter have replied.

ALL

Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any unresponsive replicas will fail the operation.

Page 17: Cassandra overview

Cassandra - Node Internals● Node comprises

○ commit log■ list of pending writes

○ memtable■ data written to system resident in memory

○ SSTables■ per CF file containing persistent data

● Memtable writes when out of space, too many keys or after time period

● SSTables comprise of○ Data - sorted strings○ Index, Bloom Filter

Page 18: Cassandra overview

Cassandra - Node Internals● Compaction occurs from time to time

○ cleans up SSTable○ removes redundant rows○ regenerates indexes

Page 19: Cassandra overview

Cassandra - Behaviour - Write● Write properties:

○ No reads○ No seeks○ Fast!○ Atomic within CF○ Always writable

Page 20: Cassandra overview

Cassandra - Behaviour - Read● Read Path:

○ Any node○ Partitioner○ Wait for R responses○ Wait for N-R responses in background and perform

read repair● Read Properties:

○ Read multiple SSTables○ Slower than writes (but stil fast)○ Seeks can be mitigated with more RAM○ Scales to billions of rows

Page 21: Cassandra overview

Cassandra - Gossip● Gossip protocol used to relay information

between nodes in cluster● Proactive communications mechanism to

share information○ nodes proactively share what they know with

random other nodes● Token space information exchanged via

gossip● Failure detection based on gossip

○ heartbeat mechanism

Page 22: Cassandra overview

Thrift API - basic calls● insert(key, column_parent, column,

consistency_level)○ key is row/keyspace identifier○ column_parent is either column identifier

■ can be column name or super column idenfier○ column is column data

● get(key, column_path, consistency_level)○ returns a column corresponding to the key

● get_slice(key, column_parent, slice_predicate, consistency_level)○ typically returns set of columns corresponding to key

Page 23: Cassandra overview

Thrift API - other operations● get multiple rows● delete row● batch operations

○ important for speeding up system○ can batch up mix of add, insert and delete

operations● keyspace and cluster management

Page 24: Cassandra overview

Denormalization● Cassandra requires query oriented design

○ determine queries first, design data models accordingly

○ in contrast to standard RDBMS■ normalize data at design time■ construct arbitrary queries usually based on joins

● Quite fundamental difference in approach○ typically results in quite different data models

● Common use of valueless columns○ column name contains data

■ good for time series data○ can have very many columns in given row

Page 25: Cassandra overview

Denormalization● Standard SQL

○ SELECT * FROM USER WHERE CITY = 'Dublin'● Typically create CF which groups users by

city○ row key is city identifer○ columns are user IDs

● Can get UID of all users in given city by querying this CF○ give city as row-key

Page 26: Cassandra overview

Other considerations...● SuperColumnFamily

○ when it is useful?● Multi data centre deployments

○ Cassandra can leverage topology to maximize resiliency

● Reaction to node failure● Reconfiguration of system

○ introduction of new nodes into existing system ● It is a complex system with many working

parts