cassandra presentation

20
Excellence in Software Engineering Cassandra How Stuff Works Sergey Enin ([email protected])

Upload: sergey-enin

Post on 15-Jan-2015

191 views

Category:

Technology


0 download

DESCRIPTION

Introduction to Cassandra

TRANSCRIPT

Page 1: Cassandra presentation

Excellence inSoftware Engineering

CassandraHow Stuff Works

Sergey Enin([email protected])

Page 2: Cassandra presentation

Excellence in Software Engineering 2

AGENDA

Agenda

Introduction

Architecture

Partitioning & Replication

Data management

Data model

Page 3: Cassandra presentation

Excellence in Software Engineering 3

Introduction

Page 4: Cassandra presentation

Excellence in Software Engineering 4

INTRODUCTION: SELECTED CASES

Who use Cassandra?

Selected Cases

eBay has Cassandra supporting multiple applications (Social Signals, Hunch, and many time series use cases) with clusters spanning several data centers.

Netflix is using Cassandra on AWS as a key infrastructure component of its globally distributed streaming product.

Shazam uses Cassandra cluster to power their recommendations system.

and many others… Check - http://www.datastax.com/cassandrausers

Page 5: Cassandra presentation

Excellence in Software Engineering 5

INTRODUCTION: MOST ADVANTAGES

Most advantages

Most advantages of Cassandra are:

• Fast writes.

• Tunable consistency.

• Decentralization.

• Integration with Hadoop.

Page 6: Cassandra presentation

Excellence in Software Engineering 6

Architecture

Page 7: Cassandra presentation

Excellence in Software Engineering 7

ARCHITECTURE: FAST WRITES

Fast writes

Cassandra is very fast on writes, cause of

use of Log-structured merge tree.

Process of inserting new record into Cassandra

Page 8: Cassandra presentation

Excellence in Software Engineering 8

ARCHITECTURE: FAST WRITE

How LSM-tree is done: Memtables and SSTables

2

1

2

3

3

4

1

Commit log – all data is written to the commit log for durability.

Each SSTable has a bloom filter associated with it. The bloom filter is used to check if a requested row key exists in the SSTable before doing any disk seeks.

SSTables are immutable. A row is typically stored across multiple SSTable files.

Deleted data is not immediately removed from disk.A deleted column can reappear. Tombstones.

Page 9: Cassandra presentation

Excellence in Software Engineering 9

ARCHITECTURE: NETWORK ARCHITECTURE

Network architecture

• All nodes – are peers

(no master).

• Client specify set of Cassandra nodes and get connected to first live node.

• Nodes are using gossip protocol.

Page 10: Cassandra presentation

Excellence in Software Engineering 10

Partitioning & replication

Page 11: Cassandra presentation

Excellence in Software Engineering 11

PARTITIONING & REPLICATION: DATA PARTITIONING

Data partitioning

Partitioner – determines, where first replica would live in the ring.

• RandomPartitioner – default strategy, provides ±same load of all nodes.

• ByteOrderedPartitioner - orders rows lexically by key bytes, allows range scans, not recommended.

Page 12: Cassandra presentation

Excellence in Software Engineering 12

PARTITIONING & REPLICATION: REPLICATION

Replication

Replication = replication factor

+ replica placement strategy

Replica placement strategy:

SimpleStrategy:• default

strategy;• not taking

network topology into account;

NetworkTopologyStrategy:• preferred,

when you have information about network map of your nodes;

Page 13: Cassandra presentation

Excellence in Software Engineering 13

Data management

Page 14: Cassandra presentation

Excellence in Software Engineering 14

DATA MANAGEMENT: DATA ACCESSING

Data accessing

READ + WRITES:

• Tunable consistency. Consistency level specify how many nodes should answer for read/write request(but writes goes to all replicas).

• Batches - sets a global consistency level and client-supplied timestamp for all columns written by the statements in the batch.

Page 15: Cassandra presentation

Excellence in Software Engineering 15

DATA MANAGEMENT: ACID

ACID

ACID

• Atomicity – writes are atomic at row level.

• Consistency – tunable consistency.

• Isolation – writes are invisible until they are complete.

• Durability – writes are durable.

• Read-repair, anti-entropy node repair, hinted handoff.

Page 16: Cassandra presentation

Excellence in Software Engineering 16

Data model

Page 17: Cassandra presentation

Excellence in Software Engineering 17

DATA MODEL: CASSANDRA`S DATA MODEL

Cassandra`s data model

Relational databases – you design schema, based on entities and relationships.

Cassandra – you design schema, based on what queries you would like to perform.

Page 18: Cassandra presentation

Excellence in Software Engineering 18

DATA MODEL: INDEXES

Indexes

An index is a data structure that allows for fast, efficient lookup of data matching a given condition.

Primary key – the unique key used to identify each row in a table.

Secondary indexes – refer to indexes on column values.

Page 19: Cassandra presentation

Excellence in Software Engineering Confidential 19

DATA MODEL: CQL3

cqlsh> INSERT INTO users

                (user_name, password)

                VALUES ('jsmith', 'ch@ngem3a');

cqlsh> SELECT * FROM users WHERE user_name='jsmith';

    user_name | password | state

    -----------+-----------+-------

    jsmith | ch@ngem3a | null

CQL3

Page 20: Cassandra presentation

Excellence in Software Engineering 20

THANK YOU!

Thank you!