cassandra overview

Download Cassandra Overview

If you can't read please download the document

Upload: btoddb

Post on 17-Jun-2015

890 views

Category:

Technology


0 download

DESCRIPTION

Short Apache Cassandra overview at the December Seattle Apache Cassandra meetup at Disney

TRANSCRIPT

  • 1. CassandraOverview

2. What Is It? It is a persistent database, but not anRDBMS more on API later It can run as a single instance or as a partof a cluster. All nodes are equal, no master, no slaves The cluster can be distributed within asingle DC or across multiple DCs. Multiple DCs can be Active-Active forperformance or Active-Passive for DR 3. Simple API Get, Put, Delete all by key Batch put and delete save wire time Range queries (iterate over sequence ofkeys) Target individual columns within a row Get and Put Native integration available for HadoopMapReduce CQL SQL like language 4. Consistent Hash Ring Conceptually all nodes in a cluster are ona ring of hash values, tokens Each node is assigned a token range onthe ring A keys hash (token) places it on the ring,within a specific nodes token range The hash is consistent, meaning thelocation of data is consistent andpredictable 5. 0=>2127(RandomPartitoner)K1=>H1(token) 21270H1=>R4(primary=N4)N=3N1RS=N4,N5,N6 N8 R1 R2 R8N7 N2R7 R3N6 N3 R6R4N5 R5 N4H1 6. Replication Replication Factor (N) determines howmany replicas exist for each key Location of replicas is determined byconsistent hash ring and the partitioner Generally, N=3 means data will be placedon node N, N+1, N+2 on the ring (This canvary based on placement strategy, but ispredictable) Powerful because no query required tofind the node(s) containing a key 7. Consistency Consistency is eventual in Cassandra it will always work to create N (ReplicationFactor) replicas Write Consistency (W) defines how manyreplicas are guaranteed per put request Read Consistency (R) defines how manyreplicas are consulted before responding W and R are tunable per request,therefore consistency is tunable as well 8. Data ModelingExample 9. Schema Overview Keyspace (database) contains one ormore ColumnFamilies ColumnFamily (table) contains zero ormore rows A Row must contain one or more columns ColumnFamilies are indexed by key(rows, but more like hash map) Rows within the same CF may havedifferent number of columns, and differentcolumn names!! 10. ExampleUserData(Keyspace) UserAttributes(ColumnFamily,sort=UTF8) Age SexWeight Ellie 4 Female 32 Age Sex Sammy 2 Male Age EyeColorHeight Sex Henry 2 Blue30Male UserAccessLog(ColumnFamily,sort=Long) 7/20/20107/22/2010 Sammy 7/22/20107/23/20107/24/2010 Henry 11. Columns Column names (not values) are sorted,per key 32 bit limit to number of columns per key entire column must fit in RAM, on onemachine Can retrieve/update/delete all columns,columns by name, or range of columns A key (or row) must contain at least oneColumn, otherwise considered deleted 12. Thrift Read Methods get return a single column for a singlekey get_slice return multiple columns for asingle key multiget_slice return multiple columnsfor a list of keys get_range_slices return multiplecolumns for a range of keys Most use high level client (Hector,Pycassa, etc) 13. Thrift Write Methods insert insert/update a single column for asingle key (most call this method, put) batch_mutate insert/update/removemultiple columns for multiple keys inmultiple ColumnFamilies remove remove a single column (orentire row) for a single key 14. Useful References http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.allthingsdistributed.com/2008/12/eventually_consistent.html http://wiki.apache.org/cassandra/ - "A description of the cassandra datamodel" - "Architecture Overview" - Operations - "Articles and Presentations"