a look at the cql changes in 3.x (benjamin lerer, datastax) | cassandra summit 2016

Benjamin Lerer

A look at the CQL changes in 3.x

• Updates and Deletions• Filtering• Grouping

Updates and Deletions (3.0)

Updates and Deletions

CREATE TABLE toys (brand textcategory text,id int,name text,price decimal,PRIMARY KEY (brand, category, id)

Clustering columns

Simple updates

INSERT INTO toys (brand, category, id, name, price)VALUES (‘Lego’, ‘Star Wars’, 75060, ‘Slave I’, 219.99)

UPDATE toys SET name = ‘Tie Fighter’, price = 219.99 WHERE brand = ‘Lego’ AND category = ‘Star Wars’ AND id = 75095

Memtable

Multi-updates

UPDATE toys SET price = 229.99 WHERE brand = ‘Lego’AND category = ‘Star Wars’AND id IN (75059, 75060, 75095)

Memtable

price: 229.99 ts: t3 ‘Star Wars’-75059 ts: Long.MIN

Column deletion

DELETE name FROM toysWHERE brand = ‘Lego’ AND category = ‘Star Wars’ AND id IN (75059, 75060)

Memtable

price: 229.99 ts: t3 ‘Star Wars’-75059 ts: Long.MIN name: <tombstone> ts: t4

Column deletion on empty Memtable

DELETE name FROM toysWHERE brand = ‘Lego’ AND category = ‘Star Wars’ AND id IN (75059, 75060)

Memtable

‘Star Wars’-75059 ts: Long.MIN name: <tombstone> ts: t4

Row deletion

DELETE FROM toysWHERE brand = ‘Lego’ AND category = ‘Star Wars’ AND id = 75059

Memtable

‘Star Wars’-75059 ts: Long.MIN deletedAt: t5

Range deletion (3.0)

DELETE FROM toysWHERE brand = ‘Lego’ AND category = ‘Star Wars’ AND id <= 75060

Memtable

DeletionInfo deletedAt: Long.MIN ranges: (‘Star Wars’ … ‘Start Wars’-75060]

Partition deletion

DELETE FROM toys WHERE brand = ‘Lego'

Memtable

DeletionInfo deletedAt: t6 ranges: (‘Star Wars’ … ‘Start Wars’-75060]

Filtering (3.0, 3.6, 3.10)

Filtering

CREATE TABLE scores ( user text, game text, year int, month int, day int, score int, PRIMARY KEY (user, game, year, month, day))

Filtering

In 2.2:

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND score >= 1000

InvalidRequest: Error from server: code=2200 [Invalid query] message="Predicates on non-primary-key columns (score) are not yet supported for non secondary index queries"

Filtering

In 3.0:

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND score >= 1000

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Filtering = Brute Force approach

Filtering

String partitionKey = "Aleksey"; String[] clusteringPrefix = new String[]{"coup"};

List<Row> rows = loadRows(partitionKey, clusteringPrefix); List<Row> filteredRows = new ArrayList<>();

for (Row row : rows) { if (row.getInt("score") >= 1000) { filteredRows.add(row); }}return filteredRows;

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND score >= 1000 ALLOW FILTERING

Clustering column filtering (3.6)

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND month = 9 ALLOW FILTERING

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND year >= 2014 AND month = 9 ALLOW FILTERING

Clustering sliceFiltering

Filtering

Filtering is performed on the replica side

Filtering can return stale data(CASSANDRA-8273)

Filtering

3 replicas: A, B, C

INSERT INTO scores (user, game, year, month, day, score)VALUES (‘Aleksey’, ‘coup’, 2016, 1, 12, 1100);

At QUORUM:

UPDATE scores SET score = 1200 WHERE user = ‘Aleksey’ AND game = ‘coup’ AND year = 2016AND month = 1 AND day = 12;

SELECT * FROM scores WHERE user = ‘Aleksey’ AND game = ‘coup’ AND score = 1100 ALLOW FILTERING

Filtering

In 3.0 filtering is supported on:• Non primary key columns• Static columns

In 3.6 filtering is also supported on clustering columns

In 3.10 filtering will be supported on partition key

When using filtering, be aware of:• Its performance unpredictability• The fact that it can return stale data

Grouping (3.10)

Grouping

SELECT year, month, max(score), min(score), count(score) FROM scoresWHERE user = ‘Aleksey’ AND game = ‘coup’ AND year = 2016GROUP BY month LIMIT 2

Year Month Day Score2016 1 12 1200

2016 1 31 800

2016 2 8 1050

2016 3 1 1400

2016 6 24 800

Grouping

SELECT score, count(*) FROM scoresWHERE user = ‘Aleksey’ AND game = ‘coup’ AND year = 2016GROUP BY score LIMIT 2

Year Month Day Score2016 1 12 1200

2016 1 31 800

2016 2 8 1050

2016 3 1 1400

2016 6 24 800

Grouping

SELECT score, count(*) FROM scoresWHERE user = ‘Aleksey’ AND game = ‘coup’ AND year = 2016GROUP BY score LIMIT 2

InvalidRequest: Error from server: code=2200 [Invalid query] message="Group by is currently only supported on the columns of the PRIMARY KEY, got score"

Grouping

CREATE MATERIALIZED VIEW yearlyHighAS SELECT user, game, year, score, month, day FROM scoresWHERE user IS NOT NULL AND game IS NOT NULL AND year IS NOT NULL AND score IS NOT NULL AND month IS NOT NULL AND day IS NOT NULLPRIMARY KEY (user, game, year, score, month, day)WITH CLUSTERING ORDER BY (game ASC, year DESC, score DESC)

Grouping

SELECT score, count(*) FROM yearlyHighWHERE user = ‘Aleksey’ AND game = ‘coup’ AND year = 2016GROUP BY score LIMIT 2

Year Score Month Day2016 1400 3 1

2016 1400 6 12

2016 1050 2 8

2016 1020 5 23

2016 800 6 24

Grouping

CREATE TABLE gameScores ( user text, game text, year int, month int, day int, score int, PRIMARY KEY ((user, game, year), month, day))

Partition key

Grouping

SELECT year, max(score), min(score), count(score) FROM gameScoresGROUP BY user, game

InvalidRequest: Error from server: code=2200 [Invalid query] message="Group by is not supported on only a part of the partition key"

Grouping

SELECT user, game, max(score), min(score), count(score) FROM scores GROUP BY user, game

D B Driver

Computes aggregates

Grouping

SELECT user, game, max(score), min(score), count(score) FROM scores GROUP BY user, game

D B Driver

Computes aggregates

Page size in # of groups

Sub-page size in # of rows

Grouping

SELECT user, game, max(score), min(score), count(score) FROM scores WHERE user = ‘Aleksey’ GROUP BY user, game

D B Driver

Computes aggregates …with TokenAwarePolicy

Per Partition Limit (3.6)

SELECT user, score FROM yearlyHighWHERE game = ‘coup’ AND year = ‘2016’PER PARTITION LIMIT 1ALLOW FILTERING

SELECT user, score, count(*) FROM yearlyHighWHERE game = ‘coup’ AND year = ‘2016’GROUP BY user, game, year, scorePER PARTITION LIMIT 1ALLOW FILTERING

Grouping by time range (CASSANDRA-11871)

CREATE TABLE temperature (deviceId text PRIMARY KEY,time timestamp,value double)

SELECT deviceId, floor(time, 2h), min(value), max(value), count(value) FROM temperatureWHERE deviceId = ‘AT-AT’GROUP BY floor(time, 2h)

Grouping

• It is only possible to group rows at the partition key level or at a clustering column level

• The GROUP BY clause only accept as arguments primary key column names in the primary key order

• Aggregates are built on the coordinator to insure consistency

• Queries might be paged internally

• If a primary key column is restricted by an equality restriction it is not required to be present in the GROUP BY clause

Questions ?

Thank you

a look at the cql changes in 3.x (benjamin lerer, datastax) | cassandra summit 2016

Software

datastaxodbcdriverforapache ......[datastax odbc driver for...

datastaxodbcdriverforapache ......[odbc drivers] datastax...

guide datastax upgradetable 3: version correspondence...

datastax odbc driver for apache cassandra and datastax ......

cassandra cql

cql@ll 1iitn]@ll - cgiar

datastaxodbcdriverforapache cassandraanddatastaxenterprise...

cql: sql in cassandra

cql for cassandra 2 -...

datastax | data science with datastax enterprise (brian...

jonathan lerer - neckwear · jonathan lerer - neckwear...

datastax | graph data modeling in datastax enterprise (artem...

cassandra eu - state of cql

cql training for measure implementers · 08-12-2017 ·...

cql under the hood

building a cql driver

new materials - cql

ben lerer: thrillist sxsw visual

carli friedman, phd - cql

cql spotlight on 2019 · cql spotlight on 2019 posted...