cassandra presentation
DESCRIPTION
Introduction to CassandraTRANSCRIPT
Excellence in Software Engineering 2
AGENDA
Agenda
Introduction
Architecture
Partitioning & Replication
Data management
Data model
Excellence in Software Engineering 3
Introduction
Excellence in Software Engineering 4
INTRODUCTION: SELECTED CASES
Who use Cassandra?
Selected Cases
eBay has Cassandra supporting multiple applications (Social Signals, Hunch, and many time series use cases) with clusters spanning several data centers.
Netflix is using Cassandra on AWS as a key infrastructure component of its globally distributed streaming product.
Shazam uses Cassandra cluster to power their recommendations system.
and many others… Check - http://www.datastax.com/cassandrausers
Excellence in Software Engineering 5
INTRODUCTION: MOST ADVANTAGES
Most advantages
Most advantages of Cassandra are:
• Fast writes.
• Tunable consistency.
• Decentralization.
• Integration with Hadoop.
Excellence in Software Engineering 6
Architecture
Excellence in Software Engineering 7
ARCHITECTURE: FAST WRITES
Fast writes
Cassandra is very fast on writes, cause of
use of Log-structured merge tree.
Process of inserting new record into Cassandra
Excellence in Software Engineering 8
ARCHITECTURE: FAST WRITE
How LSM-tree is done: Memtables and SSTables
2
1
2
3
3
4
1
Commit log – all data is written to the commit log for durability.
Each SSTable has a bloom filter associated with it. The bloom filter is used to check if a requested row key exists in the SSTable before doing any disk seeks.
SSTables are immutable. A row is typically stored across multiple SSTable files.
Deleted data is not immediately removed from disk.A deleted column can reappear. Tombstones.
Excellence in Software Engineering 9
ARCHITECTURE: NETWORK ARCHITECTURE
Network architecture
• All nodes – are peers
(no master).
• Client specify set of Cassandra nodes and get connected to first live node.
• Nodes are using gossip protocol.
Excellence in Software Engineering 10
Partitioning & replication
Excellence in Software Engineering 11
PARTITIONING & REPLICATION: DATA PARTITIONING
Data partitioning
Partitioner – determines, where first replica would live in the ring.
• RandomPartitioner – default strategy, provides ±same load of all nodes.
• ByteOrderedPartitioner - orders rows lexically by key bytes, allows range scans, not recommended.
Excellence in Software Engineering 12
PARTITIONING & REPLICATION: REPLICATION
Replication
Replication = replication factor
+ replica placement strategy
Replica placement strategy:
SimpleStrategy:• default
strategy;• not taking
network topology into account;
NetworkTopologyStrategy:• preferred,
when you have information about network map of your nodes;
Excellence in Software Engineering 13
Data management
Excellence in Software Engineering 14
DATA MANAGEMENT: DATA ACCESSING
Data accessing
READ + WRITES:
• Tunable consistency. Consistency level specify how many nodes should answer for read/write request(but writes goes to all replicas).
• Batches - sets a global consistency level and client-supplied timestamp for all columns written by the statements in the batch.
Excellence in Software Engineering 15
DATA MANAGEMENT: ACID
ACID
ACID
• Atomicity – writes are atomic at row level.
• Consistency – tunable consistency.
• Isolation – writes are invisible until they are complete.
• Durability – writes are durable.
• Read-repair, anti-entropy node repair, hinted handoff.
Excellence in Software Engineering 16
Data model
Excellence in Software Engineering 17
DATA MODEL: CASSANDRA`S DATA MODEL
Cassandra`s data model
Relational databases – you design schema, based on entities and relationships.
Cassandra – you design schema, based on what queries you would like to perform.
Excellence in Software Engineering 18
DATA MODEL: INDEXES
Indexes
An index is a data structure that allows for fast, efficient lookup of data matching a given condition.
Primary key – the unique key used to identify each row in a table.
Secondary indexes – refer to indexes on column values.
Excellence in Software Engineering Confidential 19
DATA MODEL: CQL3
cqlsh> INSERT INTO users
(user_name, password)
VALUES ('jsmith', 'ch@ngem3a');
cqlsh> SELECT * FROM users WHERE user_name='jsmith';
user_name | password | state
-----------+-----------+-------
jsmith | ch@ngem3a | null
CQL3
Excellence in Software Engineering 20
THANK YOU!
Thank you!