cassandraに不向きなcassandraデータモデリング基礎
TRANSCRIPT
![Page 1: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/1.jpg)
Best Better practice of CassandraCassandraに不向きなCassandraデータモデリング基礎
Hayato TsutsumiWorks Applications
![Page 2: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/2.jpg)
Hayato Tsutsumi堤 勇人
Cassandra experience : 7 years (from ver.0.6)Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes:about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.Works Applications Co., Ltd
自己紹介 Speaker
![Page 3: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/3.jpg)
Target
● Mid-range SystemData size 1TB ~ 1PBData amount 10 Mil ~ 100 Bil
+ high speed processing
![Page 4: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/4.jpg)
What is better?
![Page 5: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/5.jpg)
Best practice = Right people, right place適材適所は確かにベスト
Suitable Data
![Page 6: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/6.jpg)
But not all data is suitableじゃあベストじゃない部分は?
Our system
Suitable Data
Un-suitable data for Cassandra
![Page 7: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/7.jpg)
Use both Cassandra and RDB?
O*or
M*
Suitable Data
![Page 8: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/8.jpg)
You may use only RDB...
O*or
M*
Suitable Data
![Page 9: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/9.jpg)
Another way : Manage data only with Cassandra
Suitable Data
![Page 10: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/10.jpg)
3 models unlike NoSQL
Historical data履歴管理データ
Tree structureツリー構造
Summarized data計上データ
![Page 11: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/11.jpg)
How Cassandra read data in 3mins
前提
![Page 12: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/12.jpg)
Partition key &Clustering key
CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));
Partition Key Clustering Key
hash(pkA1,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
![Page 13: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/13.jpg)
Partition key &Clustering key
CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));
Partition Key Clustering Key
hash(pkA1,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
![Page 14: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/14.jpg)
Partition key &Clustering key
CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB));
Partition Key Clustering Key
where pkA = "pkA1"; //NGwhere pkA = "pkA1" and pkB = "pkB1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NGwhere pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NGwhere pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OKwhere pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK
![Page 15: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/15.jpg)
Historical data履歴管理データ
photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)
![Page 16: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/16.jpg)
社員の異動情報Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
![Page 17: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/17.jpg)
社員の異動情報Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.
![Page 18: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/18.jpg)
emp_history table
CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no));
select * from emp_historywhere id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?
![Page 19: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/19.jpg)
emp_history table
CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no));
CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi.SASIIndex';
select * from emp_historywhere id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index
![Page 20: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/20.jpg)
Tree structureツリー構造
![Page 21: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/21.jpg)
組織構造Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list● Path Enumeration● Nested set● Closure table
![Page 22: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/22.jpg)
判断のポイントCriteria
● No join, recursive query● Anyway need consistency● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ不可
● 整合性はどの道別の方法で取る必要がある
● ジェイウォーク、非正規化も当たり前
![Page 23: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/23.jpg)
ツリー構造への要求Requirement to tree model
● show ancestors● show children● show descendants● show sibilings of a
● あるノードからルートまでの全ての親を取得
● 子供を1段展開● 子供を全て展開● 兄弟を取得
![Page 24: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/24.jpg)
組織構造Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list● Path Enumeration● Nested set● Closure table
Worth considering!
![Page 25: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/25.jpg)
経路列挙Path enumeration
CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5
![Page 26: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/26.jpg)
経路列挙Path enumeration
CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));
select * from pathenumwhere id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5
![Page 27: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/27.jpg)
経路列挙Path enumeration
CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));
select * from pathenumwhere id = 'test' and fqdn like 'A:'; //NG
select * from pathenumwhere id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A; U+003B
It needs 'like' search
A a
b
1
2
3
4
5
![Page 28: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/28.jpg)
経路列挙Path enumeration
CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));
//show ancestorsfqdn.split(":");
//show children of aselect child from pathenum where id = 'test' and fqdn = 'A:a';
//show descendants of Aselect * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of aselect p from fqdntest where id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5
![Page 29: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/29.jpg)
経路列挙Path enumeration
CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn));
pros- one access
cons- hot spot- range slice- complex process when update
pros & cons
![Page 30: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/30.jpg)
閉包テーブルClosure table
CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
![Page 31: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/31.jpg)
閉包テーブルClosure table
CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestorsselect p from closure_path where c = '1';select * from closure_main where id in [?];
//show children of aselect c from closure_path where p = 'a' and d = 1;select * from closure_main where id in [?];
//show descendants of Aselect c from closure_path where p = 'A';select * from closure_main where id in [?];
//show sibilings of a//load a's parent = Aselect * from closure_path where c = 'a';select c from closure_path where p = 'A' and d = 1;select * from closure_main where id in [?];
![Page 32: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/32.jpg)
pros- Distributed- get access
cons- need an index- 2 ~ 3 times access - increase data- complex process when update
pros & cons閉包テーブルClosure table
CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
![Page 33: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/33.jpg)
pros- Distributed- get access
cons- need an index- 2 ~ 3 times access - increase data- complex process when update
pros & cons閉包テーブルClosure table
CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id));CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c));CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
How increase data?
When assume n-children per node and d-depth tree, number of data will be proportional to d.
![Page 34: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/34.jpg)
Summarizeddata計上データ
![Page 35: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/35.jpg)
伝票集計処理Aggregation of slips
Dr. Cr.
A 200 B 50
C 150
![Page 36: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/36.jpg)
伝票集計処理Aggregation of slips
parallel batch processing
aggregation
online streaming
![Page 37: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/37.jpg)
要求水準Requirements
● miscalculation = critical● need parallel / streaming
processing● need high speed
processing
● 誤計算は死● バッチの並列処理、オン
ラインによるストリーミング処理が必要
● 高速処理が求められる
![Page 38: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/38.jpg)
● miscalculation = critical● need parallel / streaming
processing● need high speed
processing
● 誤計算は死● バッチの並列処理、オン
ラインによるストリーミング処理が必要
● 高速処理が求められる
= Consistency!
要求水準Requirements
![Page 39: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/39.jpg)
計上データSummarized data
CREATE TABLE countup ( id text PRIMARY KEY, v counter);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.
![Page 40: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/40.jpg)
計上データSummarized data
CREATE TABLE countup ( id text PRIMARY KEY, v int);
UPDATE countup set v = 101 where id = 'test' if v = 100;
Use update with LWT
![Page 41: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/41.jpg)
What is the best?
![Page 42: Cassandraに不向きなcassandraデータモデリング基礎](https://reader031.vdocuments.mx/reader031/viewer/2022030318/5a6551277f8b9aff1a8b49c1/html5/thumbnails/42.jpg)
Thanks!