cassandraに不向きなcassandraデータモデリング基礎

Best Better practice of CassandraCassandraに不向きなCassandraデータモデリング基礎

Hayato TsutsumiWorks Applications

Hayato Tsutsumi堤勇人

Cassandra experience : 7 years (from ver.0.6)Certification for Apache Cassandra Administarator

Data size : about 40TB

Nodes：about 40 (would increase soon...)

Twitter : 2t3

Site Reliability Engineering Div.Works Applications Co., Ltd

自己紹介 Speaker

Target

● Mid-range SystemData size 1TB ~ 1PBData amount 10 Mil ～ 100 Bil

+ high speed processing

What is better?

Best practice = Right people, right place適材適所は確かにベスト

Suitable Data

But not all data is suitableじゃあベストじゃない部分は？

Our system

Suitable Data

Un-suitable data for Cassandra

Use both Cassandra and RDB?

O*or

M*

Suitable Data

You may use only RDB...

O*or

M*

Suitable Data

Another way : Manage data only with Cassandra

Suitable Data

3 models unlike NoSQL

Historical data履歴管理データ

Tree structureツリー構造

Summarized data計上データ

How Cassandra read data in 3mins

前提

Partition key &Clustering key

CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));

Partition Key Clustering Key

hash(pkA1,pkB1)

Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w

Value v1 w1 v2 w2

hash(pkA1,pkB2)

Column ckA3:ckB3:v ckA3:ckB3:w

Value v3 w3

Column pkA pkB ckA ckB v w

Value pkA1 pkB1 ckA1 ckB1 v1 w1

pkA1 pkB1 ckA1 ckB2 v2 w2


on Table

on Cassandra Cassandra can search Column name


CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));


hash(pkA1,pkB1)

Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w

Value v1 w1 v2 w2

hash(pkA1,pkB2)

Column ckA3:ckB3:v ckA3:ckB3:w

Value v3 w3

Column pkA pkB ckA ckB v w

Value pkA1 pkB1 ckA1 ckB1 v1 w1



on Table

on Cassandra Cassandra can search Column name


CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));


where pkA = "pkA1"; //NGwhere pkA = "pkA1" and pkB = "pkB1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK

where pkA = "pkA1" and ckA = "ckA1"; //NGwhere pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG

where pkA = "pkA1" and pkB >= "pkB1"; //NGwhere pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG

where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK

Historical data履歴管理データ

photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)

社員の異動情報Employee transfer history

A Div. B Div. C Div.

A Div. C Div. D Div.

emp001

emp002

D Div. E Div.emp003

4/1 4/16 5/13/112/1 2/21

社員の異動情報Employee transfer history

A Div. B Div. C Div.

A Div. C Div. D Div.

emp001

emp002

D Div. E Div.emp003

4/1 4/16 5/13/112/1 2/21

at 3/25

emp001 emp002 emp003

B Div. A Div. E Div.

at 4/25

emp001 emp002 emp003

C Div. D Div. E Div.

emp_history table

CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no));

select * from emp_historywhere id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG

?

emp_history table

CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no));

CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi.SASIIndex';

select * from emp_historywhere id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK

use custom index

Tree structureツリー構造

組織構造Organization tree

A Div. a Dept.

b Dept.

1 Sec.

2 Sec.

3 Sec.

4 Sec.

5 Sec.

Well known models

● Adjacency list● Path Enumeration● Nested set● Closure table

判断のポイントCriteria

● No join, recursive query● Anyway need consistency● Jaywalk or

denormalization is Natural

● JOIN、再帰問い合わせ不可

● 整合性はどの道別の方法で取る必要がある

● ジェイウォーク、非正規化も当たり前

ツリー構造への要求Requirement to tree model

● show ancestors● show children● show descendants● show sibilings of a

● あるノードからルートまでの全ての親を取得

● 子供を1段展開● 子供を全て展開● 兄弟を取得

組織構造Organization tree

A Div. a Dept.

b Dept.

1 Sec.

2 Sec.

3 Sec.

4 Sec.

5 Sec.

Well known models

● Adjacency list● Path Enumeration● Nested set● Closure table

Worth considering!

経路列挙Path enumeration

CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));

id fqdn child code

test A [a,b] A

test A:a [1,2] a

test A:b [3,4,5] b

test A:a:1 1

test A:a:2 2

test A:b:3 3

test A:b:4 4

test A:b:5 5

A a

b

1

2

3

4

5



select * from pathenumwhere id = 'test' and fqdn like 'A:'; //NG

It needs 'like' search

A a

b

1

2

3

4

5



select * from pathenumwhere id = 'test' and fqdn like 'A:'; //NG

select * from pathenumwhere id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK

: U+003A; U+003B

It needs 'like' search

A a

b

1

2

3

4

5



//show ancestorsfqdn.split(":");

//show children of aselect child from pathenum where id = 'test' and fqdn = 'A:a';

//show descendants of Aselect * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;';

//show sibilings of aselect p from fqdntest where id = 'test' and fqdn = 'A';

A a

b

1

2

3

4

5



pros- one access

cons- hot spot- range slice- complex process when update

pros & cons

閉包テーブルClosure table

CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));

id v

A A Div.

a a Dept.

b b Dept.

1 1 Sec.

2 2 Sec.

3 3 Sec.

4 4 Sec.

5 5 Sec.

p c d

A A 0

A a 1

A b 1

A 1 2

A 2 2

A 3 2

A 4 2

A 5 2

a a 0

a 1 1

p c d

a 2 1

1 1 0

2 2 0

b b 0

b 3 1

b 4 1

b 5 1

3 3 0

4 4 0

5 5 0

閉包テーブルClosure table

CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';

p c d

A A 0

A a 1

A b 1

A 1 2

A 2 2

A 3 2

A 4 2

A 5 2

a a 0

a 1 1

p c d

a 2 1

1 1 0

2 2 0

b b 0

b 3 1

b 4 1

b 5 1

3 3 0

4 4 0

5 5 0

//show ancestorsselect p from closure_path where c = '1';select * from closure_main where id in [?];

//show children of aselect c from closure_path where p = 'a' and d = 1;select * from closure_main where id in [?];

//show descendants of Aselect c from closure_path where p = 'A';select * from closure_main where id in [?];

//show sibilings of a//load a's parent = Aselect * from closure_path where c = 'a';select c from closure_path where p = 'A' and d = 1;select * from closure_main where id in [?];

pros- Distributed- get access

cons- need an index- 2 ~ 3 times access - increase data- complex process when update

pros & cons閉包テーブルClosure table


pros- Distributed- get access

cons- need an index- 2 ~ 3 times access - increase data- complex process when update

pros & cons閉包テーブルClosure table


How increase data?

When assume n-children per node and d-depth tree, number of data will be proportional to d.

Summarizeddata計上データ

伝票集計処理Aggregation of slips

Dr. Cr.

A 200 B 50

C 150

伝票集計処理Aggregation of slips

parallel batch processing

aggregation

online streaming

要求水準Requirements

● miscalculation = critical● need parallel / streaming

processing● need high speed

processing

● 誤計算は死● バッチの並列処理、オン

ラインによるストリーミング処理が必要

● 高速処理が求められる

● miscalculation = critical● need parallel / streaming

processing● need high speed

processing

● 誤計算は死● バッチの並列処理、オン

ラインによるストリーミング処理が必要

● 高速処理が求められる

= Consistency!

要求水準Requirements

計上データSummarized data

CREATE TABLE countup ( id text PRIMARY KEY, v counter);

UPDATE countup SET v = v + 1 WHERE id = 'test';

Use Counter...? No.

計上データSummarized data

CREATE TABLE countup ( id text PRIMARY KEY, v int);

UPDATE countup set v = 101 where id = 'test' if v = 100;

Use update with LWT

What is the best?

Thanks!