cassandra london - 2.2 and 3.0

44
Christopher Batey @chbatey 2.2 & 3.0

Upload: christopher-batey

Post on 12-Aug-2015

100 views

Category:

Software


1 download

TRANSCRIPT

Christopher Batey@chbatey

2.2 & 3.0

@chbatey

First comes a blog• Each new feature has a vastly more detailed blog post:

http://christopher-batey.blogspot.co.uk/

@chbatey

Were did 2.2 come from?

@chbatey

Don't start Thrift rpc by default (CASSANDRA-9319)

@chbatey

New features• 2.2- JSON- User defined functions- User defined aggregates- The small print• 3.0- New storage engine- A new way to denormalise/duplicate

@chbatey

So who’s taken some data out of C* and serialised it as JSON?

@chbatey

Hello JSON• create TABLE user (username text primary key,

first_name text , last_name text , emails set<text> , country text);• INSERT INTO user JSON '{"username": "chbatey", "first_name":"Christopher", "last_name": "Batey", “emails":["[email protected]"]}';

@chbatey

Goodbye JSON

@chbatey

JSON + User Defined Types• CREATE TYPE movie (title text, time timestamp,

description text);• ALTER TABLE user ADD movies set<frozen<movie>>;• UPDATE user SET movies = {{ title:'Batman', time:'2011-02-03T04:05:00+0000', description: 'This film rocks' }} where username = 'chbatey';

@chbatey

Out it comes

@chbatey

Cassandra HTTP Wrapper?

@RequestMapping(method = {RequestMethod.POST}, value = "/{keyspace}/{table}", consumes = "application/json") public ResponseEntity<String> store(@PathVariable String keyspace, @PathVariable String table, @RequestBody String body) { session.execute(String.format("insert into %s.%s JSON '%s'", keyspace, table, body)); return ResponseEntity.ok("OK");}

Keyspace Table

Raw JSON

curl --header "Content-Type: application/json" -X POST -v "localhost:8080/twotwo/user" --data '{"username": "trev2", "country": null, "emails": ["[email protected]", "[email protected]"], "first_name": "trevor", "last_name": "bunting", "movies": null}'

@chbatey

User defined functions• Run code on the server !Dangerous!• Java + JavaScript supported out of the box• javax.script implementations should work

@chbatey

UDF exampleCREATE TABLE user ( username text primary key, first_name text , last_name text , emails set<text> , country text);

@chbatey

Concat

CREATE FUNCTION name ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘ return first_name + " " + last_name; ‘;

cqlsh:twotwo> select name(first_name, last_name) FROM user;

twotwo.name(first_name, last_name)------------------------------------ Christopher Batey

@chbatey

User defined aggregatesCREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0);

Called for every row state passed between

Initial state

Return type (CQL)

Optional function called onfinal state

@chbatey

State functionCREATE FUNCTION averageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (val != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; ';

Type Columns

@chbatey

Final functionCREATE FUNCTION averageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); ';

State typeOverall return type

@chbatey

Putting it all together

@chbatey

Customer events

CREATE AGGREGATE count_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {};

CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ';

@chbatey

Customer events

@chbatey

Built in aggregates• count• max• min• avg• sum

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java

@chbatey

Built in time functions

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java

@chbatey

Built in aggregates in action

@chbatey

“Materialised views” with Spark

@chbatey

Pure C*

@chbatey

Small print• Compressed commit log• Resumable bootstrapping• Stop individual compactions• New types- smallint - short- tinyint - byte- date - time• Warnings now sent back to client- batch too large

@chbatey

Time

@chbatey

Customer events tableCREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

create INDEX on customer_events (staff_id) ;

@chbatey

Indexes to the rescue?customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

@chbatey

Indexes to the rescue?

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbatey

staff_id customer_idbill rustybill rustytrevor rusty

A B

chbatey rusty

customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

customer_events tablestaff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

staff_id index

@chbatey

Do it your self indexCREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))

@chbatey

1.2 Logged batchesclient

C BATCH LOG

BL-R

BL-R

BL-R: Batch log replica

@chbatey

Pattern• Write only:- Duplicate with a different primary key- (Optional) Logged batch for eventual consistency• Full updates:- No real difference• Partial updates:- No staff id in update?

@chbatey

@chbatey

KillrWeather data model

@chbatey

KillrWeather data model

@chbatey

KillrWeather data model

@chbatey

KillrWeather data modelINSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station1', 2012, 12, 25, 1, 'GB', 'Cumbria', 14.0, 20) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station2', 2012, 12, 25, 1, 'GB', 'Cumbria', 4.0, 2) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station3', 2012, 12, 25, 1, 'GB', 'Greater London', 16.0, 10) ;

@chbatey

Querying by state?

@chbatey

Combining aggregates + MVs

@chbatey

Including the month

@chbatey

Fine print - all subject to change• Primary key columns + one other in your MV primary key• Un-used Primary key columns are added to the end of

your MV PK• If the part of your primary key is NULL then it won't

appear in the materialised view• This is not free!

@chbatey

Conclusions• We still denormalise and duplicate to achieve scalability

and performance• We just let C* do it for us :)

@chbatey

• Robert Stupp (Contentteam AG) - UDA/Fs

• Carl Yeksigian (DataStax) - Materalised views

• Jason Brown - Gossip• Christos Kalantzis (Netflix) -

Chaos Money & Cassandra