cassandra in action - apacheconarchive.apachecon.com/.../cassandra/11:00-cassandra...cassandra...

39
Cassandra in Action Yuki Morishita Software Developer@DataStax / Apache Cassandra Committer 1 ApacheCon NA 2013

Upload: others

Post on 20-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

Cassandra in Action

Yuki MorishitaSoftware Developer@DataStax / Apache Cassandra Committer

1

ApacheCon NA 2013

Page 2: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax2

Page 3: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Application/Use Case• Social Signals: like/want/own features for eBay product

and item pages• Hunch taste graph for eBay users and items• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

ACE3

eBay

Page 4: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Time series data

4

Page 5: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Multi-Datacenter Support

5

Page 6: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Distributed counters

6

Page 7: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Hadoop support

7

Page 8: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Application/Use Case• Meet the data management needs of user facing

applications across The Walt Disney Company with a single platform

Why Cassandra? • DataStax Enterprise can tackle real-time and search

functions in the same cluster• Scalability• 24x7 uptime

NDI8

Disney

Page 9: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Multitenancy

9

Page 10: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Multi-tenancy

10

Page 11: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Enterprise search

11

Page 12: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Application/Use Case• General purpose backend for large scale highly available

cloud based web services supporting Netflix Streaming

Why Cassandra? • Highly available, highly robust and no schema

change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and SimpleDB

implementations• Flexible data model• Ability to directly influence/implement OSS feature set• Supports local and wide area distributed operations,

spanning US and Europe

RCE12

Netflix

Page 13: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Optimized for SSD

13

Page 14: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Open source

14

Page 15: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

•High performance •Massively Scalable• Reliable/Available

Use case patterns

15

Page 16: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Cassandra Scalability

16

“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput from 1 to 12 nodes.”

Page 17: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Classic partitioning with SPOF

master

slave

slavepartition 1 partition 2 partition 3 partition 4

router

client17

Page 18: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Fully distributed, no SPOF

client

p1

p1

p1p3

p6

18

Page 19: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Primary Key as “Partition Key”

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

19

Page 20: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim

carol

johnny

suzy

PK

5e02739678...

a9a0198010...

f4eb27cea7...

78b421309e...

Hashed Value

MD5* hash operation yields a 128-bit number for keysof any size.

20

Page 21: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Node A

Node D Node C

Node B

21

The “Token Ring”

Page 22: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start End

A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

22

Page 23: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

23

Start End

A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 24: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

24

Start End

A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 25: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

25

Start End

A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 26: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

26

Start End

A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 27: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

27

Replication

Page 28: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

Replication Factor = 2

28

Page 29: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Node A

Node D Node C

Node B

carol a9a0198010...

29

Replication Factor = 3

Page 30: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Tunable Consistency

•Consistency Level• READ•ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL

•WRITE•ANY, ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL

30

Page 31: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax31

Virtual Nodes

withoutVnodes

withVnodes

Page 32: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax32

Virtual Nodes

Cluster ofheterogeneous

machines

Page 33: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Virtual Nodes

33

Page 34: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

INSERT INTO users (id, name, state, birth_date) VALUES (‘49290170-817f-11e2-9e96-0800200c9a66’, ‘john’, ‘Texas’, 1990);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

CQL - Cassandra Query Language

Page 35: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2013 DataStax

Strictly “realtime” focused

•No joins•No subqueries•No aggregation functions* or GROUP BY•ORDER BY?

35

Page 36: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

Page 37: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

X

Page 38: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};

Collections

Page 39: Cassandra in Action - ApacheConarchive.apachecon.com/.../Cassandra/11:00-Cassandra...Cassandra Scalability 16 “In terms of scalability, there is a clear winner throughout our experiments

39

Question ?

Feel free to contact me later if you have one [email protected] yukim (IRC, twitter)