killrvideo: data modeling evolved (patrick mcfadin, datastax) | cassandra summit 2016

36
KillrVideo: Data Modeling Evolved 1 Patrick McFadin Chief Evangelist for Apache Cassandra, DataStax @PatrickMcFadin

Upload: datastax

Post on 16-Apr-2017

134 views

Category:

Software


3 download

TRANSCRIPT

Page 1: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

KillrVideo: Data Modeling Evolved

1

Patrick McFadinChief Evangelist for Apache Cassandra, DataStax@PatrickMcFadin

Page 2: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Sherman, set the wayback machine

for 2012

But Mr Peabody!That’s before

CQL3!

History is rarely pretty Sherman

Page 3: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016
Page 4: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016
Page 5: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Thrift Insert!

Page 6: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Thrift Select!

Page 7: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3

Page 8: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

KillrVideo

https://killrvideo.github.io/

Page 9: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3.0 - Cassandra 1.2

•Goodbye CQL 2.0! • Custom secondary indexes • Empty IN

Page 10: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3.1 - Cassandra 2.0

• Aliases • CREATE <table> IF NOT EXISTS • INSERT IF NOT EXISTS • UPDATE IF • DELETE IF EXISTS • IN supports cluster columns

LWT

Page 11: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3.2 - Cassandra 2.1

• User Defined Types • Collection Indexing • Indexes can use contains • Tuples?

Page 12: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

User Defined Types

CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text );

Page 13: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

User Defined Types

CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );

Page 14: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016
Page 15: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3.3 - Cassandra 2.2

• Date and Time are now types • TinyInt and SmallInt • User Defined Functions • Aggregates • User Defined Aggregates

Page 16: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

User Defined Functions

CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );

CREATE OR REPLACE FUNCTION avg_rating (rating_counter counter, rating_total counter) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(rating_total.doubleValue()/rating_counter.doubleValue());';

Page 17: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

User Defined Functions

SELECT avg_rating(rating_counter, rating_total) AS avg_rating FROM video_rating WHERE videoid = 99051fe9-6a9c-46c2-b949-38ef78858dd0;

Page 18: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Aggregates

CREATE TABLE video_ratings_by_user ( videoid uuid, userid uuid, rating int, PRIMARY KEY (videoid, userid) );

SELECT count(userid) FROM video_ratings_by_user WHERE videoed = 49f64d40-7d89-4890-b910-dbf923563a33

Page 19: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

CQL 3.4 - Cassandra 3.x

• CAST operator • Per Partition Limit •Materialized Views • SASI

Page 20: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View

CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );

Lookup by this?

Page 21: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View

CREATE TABLE videos_by_location ( videoid uuid, userid uuid, location text, added_date timestamp, PRIMARY KEY (location, videoid) );

Roll your own

Page 22: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View

CREATE MATERIALIZED VIEW videos_by_location AS SELECT userid, added_date, videoid, location FROM videos WHERE videoId IS NOT NULL AND location IS NOT NULL PRIMARY KEY(location, videoid);

Cassandra rolls for you

Page 23: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View Perf

Page 24: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View Perf

5 Materialized Views vs 5 tables writes async

Page 25: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Materialized View

SELECT location, videoid FROM videos_by_location ;

location | videoid -------------------------------------------------+-------------------------------------- http://www.youtube.com/watch?v=px6U2n74q3g | 06049cbb-dfed-421f-b889-5f649a0de1ed http://www.youtube.com/watch?v=qphhxujn5Es | 873ff430-9c23-4e60-be5f-278ea2bb21bd /us/vid/0c/0c3f7e87-f6b6-41d2-9668-2b64d117102c | 0c3f7e87-f6b6-41d2-9668-2b64d117102c /us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | 99051fe9-6a9c-46c2-b949-38ef78858dd0 /us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | b3a76c6b-7c7f-4af6-964f-803a9283c401 http://www.youtube.com/watch?v=HdJlsOZVGwM | 49f64d40-7d89-4890-b910-dbf923563a33 /us/vid/41/416a5ddc-00a5-49ed-adde-d99da9a27c0c | 416a5ddc-00a5-49ed-adde-d99da9a27c0c

Page 26: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI

CREATE TABLE users ( userid uuid, firstname varchar, lastname varchar, email text, created_date timestamp, PRIMARY KEY (userid) );

Lookup by this?

Page 27: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Storage Attached Secondary Index

Page 28: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI

Page 29: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI

CREATE CUSTOM INDEX ON users (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false' };

Page 30: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI

CREATE CUSTOM INDEX ON users (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};

Page 31: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI

CREATE CUSTOM INDEX ON users (created_date) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'SPARSE'};

Page 32: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI IndexesClient

INSERT INTO users(userid,firstname,lastname,email,created_date) VALUES (9761d3d7-7fbd-4269-9988-6cfd4e188678,’Patrick’,’McFadin’, ’[email protected]’,’2015-06-01’);

userid 1

userid 2

Memtable

SSTable

SSTable

SSTable

SASI Index

Node

Data

lastname

lastname

firstname

firstname

email

email

created_date

created_date

SASI Index

SASI Index

Indexer

Page 33: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI Queries

SELECT * FROM users WHERE firstname LIKE 'pat%';

SELECT * FROM users WHERE lastname LIKE ‘%Fad%';

SELECT * FROM users WHERE email LIKE '%data%';

SELECT * FROM users WHERE created_date > '2011-6-15' AND created_date < '2011-06-30';

userid | created_date | email | firstname | lastname --------------------------------------+---------------------------------+----------------------+-----------+---------- 9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | [email protected] | Patrick | McFadin

Page 34: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI Guidelines

•Multiple fields to search •No more than 1000 rows returned • You know the partition key • Indexing static columns

Use SASI when…

Page 35: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

SASI Guidelines

• Searching large partitions • Tight SLA on reads • Search for analytics • Ordering search is important

Don’t Use SASI when…

Page 36: KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Thank you!Questions?

Follow me @PatrickMcFadin