Введение в apache cassandra
DESCRIPTION
Saratov open it teach talk. Дамир Яраев: Введение в Apache Cassandra (В ходе презентации Дамир расскажет, когда и почему стоит переходить с проверенных временем реляционных баз данных на ставшие модными в последнее время решения на базе NoSQL. В качестве примера рассмотрит колоночную NoSQL базу данных Apache Cassandra)TRANSCRIPT
© 2014 Grid Dynamics
Created for BigData Community by Dmitry Yaraev
Apache CassandraWhen and Why
© 2014 Grid Dynamics
Agenda
1. When RDBMS Becomes a Bottleneck2. Concepts of NoSQL Paradigm3. Variety of NoSQL Databases4. Why Apache Cassandra?5. Essential Use Cases of Cassandra6. Bad Usage Patterns
Page 1
© 2014 Grid Dynamics
What Is Offered by RDBMS
● Mature technology with common standards● Easy migration from one engine to another● Data model corresponds to the real world● Structured Query Language (SQL)● ACID transactions
Page 2
© 2014 Grid Dynamics
Bottlenecked by RDBMS
● Horizontal scalability● Schema support and migration● Server and maintenance cost
Page 3
© 2014 Grid Dynamics
NoSQL :: History
● First mention in 1998● Class of distributed databases● Not Only SQL
Page 4
© 2014 Grid Dynamics
NoSQL :: Features
● Simple schema without relations● Good horizontal scalability● Combination of two of the following:
○ Consistency○ Availability○ Partition Tolerance
Page 5
© 2014 Grid Dynamics
NoSQL :: CAP Theorem
Page 6
© 2014 Grid Dynamics
NoSQL :: Storage Types
Page 7
© 2014 Grid Dynamics
Questions?
Page 8
© 2014 Grid Dynamics
Cassandra :: What Is It?
● Wide-column distributed data store● The latest version is 2.1.2 (released this month)● Proved itself in production (Instagram, Spotify,
eBay and many other big players on IT market)
Page 9
© 2014 Grid Dynamics
Cassandra :: Origin
● Originally created in Facebook● Open-sourced in 2008● Apache incubator project in early 2009● Top level Apache project in March 2010
Page 10
© 2014 Grid Dynamics
Cassandra :: Features
● High scalability● Tunable consistency● Cross-datacenter replication● Query language (CQL)● Drivers for a variety of languages● Lightweight transactions● Indexing
Page 11
© 2014 Grid Dynamics
Cassandra :: Data Types
● Primitive types● Arbitrary bytes (blob)● Collections (list, map, set)● Tuples (tuple)● User defined
Page 12
© 2014 Grid Dynamics
Cassandra :: Data Model
● Keyspace● ColumnFamily● Row● Column
Page 13
© 2014 Grid Dynamics
Cassandra :: Data Model
Page 14
© 2014 Grid Dynamics
Cassandra :: ColumnFamily
Page 15
© 2014 Grid Dynamics
Cassandra :: CQL3
● SQL-like syntax● Three types of statements
○ data definition statements○ data manipulation statements○ data look up statements
● Prepared statements
Page 16
© 2014 Grid Dynamics
Cassandra :: Example Queries
CREATE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob );
SELECT * FROM songs WHERE artist = ‘Metallica’; -- RETURNS AN ERROR
CREATE INDEX ON songs(artist);
SELECT * FROM songs WHERE artist = ‘Metallica’;
Page 17
© 2014 Grid Dynamics
Cassandra :: Data Distribution
Page 18
© 2014 Grid Dynamics
Cassandra :: Replication
Page 19
© 2014 Grid Dynamics
Cassandra :: Eventual Consistency
Page 20
© 2014 Grid Dynamics
Cassandra :: Tunable Consistency
Page 21
© 2014 Grid Dynamics
Cassandra :: Consistency Levels
● Defines a condition for successful read/write operation
● Multiple Options○ ONE○ ALL○ QUORUM,○ LOCAL_QUORUM○ SERIAL○ …
● Can be specified per request
Page 22
© 2014 Grid Dynamics
Cassandra :: Consistency (Quorum)
Page 23
© 2014 Grid Dynamics
Cassandra :: Consistency (ONE)
Page 24
© 2014 Grid Dynamics
Cassandra :: Consistency (ONE)
Page 25
© 2014 Grid Dynamics
Cassandra :: Use Cases
● Large data sets and simple scaling● Perfectly fits for semi-structured data● Fault tolerance (no SPoF)● High write throughput
Page 26
© 2014 Grid Dynamics
● No good for large blobs ( > 64MB )● When there are more read operations than
writes ones and low read latency is critical● ACID transactions
Cassandra :: Limitations
Page 27
© 2014 Grid Dynamics
Thanks!
Page 28