sasi, cassandra on the full text search ride - duyhai doan - codemotion milan 2016
TRANSCRIPT
@doanduyhai
SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra™ Evangelist Apache Zeppelin™ Commiter
MILAN 25-26 NOVEMBER 2016
@doanduyhai
Datastax
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = better Apache Cassandra + extra features
2
@doanduyhai
1 5 minutes introduction to Apache Cassandra™ 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Take away
3
@doanduyhai
Trademark Policy
4
From now on …
Cassandra ⩵ Apache Cassandra™
@doanduyhai
5 minutes introduction to Apache Cassandra™
@doanduyhai
The tokens
6
Random hash of #partition à token = hash(#p)
Hash: ] –x, x ] hash range: 264 values
x = 264/2
C*
C*
C*
C*
C* C*
C* C*
@doanduyhai
Token ranges
7
A : −x,−3x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
B : −3x4
,− 2x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
C : −2x4
,− x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
D : − x4
,0⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
E : 0, x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
F : x4
,2x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
G : 2x4
,3x4
⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
H : 3x4
,x⎤
⎦
⎥⎥
⎤
⎦
⎥⎥
H
A
E
D
B C
G F
@doanduyhai
Distributed tables
8
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
CREATE TABLE users( user_id int, …, PRIMARY KEY(user_id)),
@doanduyhai
Distributed tables
9
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
@doanduyhai
Coordinator node
10
Responsible for handling requests (read/write)
Every node can be coordinator• masterless • no SPOF • proxy role
H
A
E
D
B C
G F
coordinator
request
1
2 3
@doanduyhai 11
Q & A
! "
@doanduyhai
SASI introduction
@doanduyhai
What is SASI ?
13
• SSTable-Attached Secondary Index à new 2nd index impl that follows SSTable life-cycle
• Objective: provide more performant & capable 2nd index
@doanduyhai
Who created it ?
14
Open-source contribution by an engineers team
@doanduyhai
Why is it better than native 2nd index ?
15
• follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized• new data-strutures • range query (<, ≤, >, ≥) possible • full text search options
@doanduyhai 16
Demo
@doanduyhai
SASI cluster-wide
@doanduyhai
Distributed index
18
On cluster level, SASI works exactly like native 2nd index
H
A
E
D
B C
G F
UK user1 user102 … user493
US user54 user483 … user938
UK user87 user176 … user987
UK user17 user409 … user787
@doanduyhai
Distributed search algorithm
19
H
A
E
D
B C
G F
coordinator
1st roundConcurrency factor = 1
@doanduyhai
Distributed search algorithm
20
H
A
E
D
B C
G F
coordinator
Not enough results ?
@doanduyhai
Distributed search algorithm
21
H
A
E
D
B C
G F
coordinator
2nd roundConcurrency factor = 2
@doanduyhai
Distributed search algorithm
22
H
A
E
D
B C
G F
coordinator
Still not enough results ?
@doanduyhai
Distributed search algorithm
23
H
A
E
D
B C
G F
coordinator
3rd roundConcurrency factor = 4
@doanduyhai
Caveat 1: non restrictive filters
24
H
A
E
D
B C
G F
coordinator
Hit all nodes
eventually L
@doanduyhai
Caveat 1 solution : always use LIMIT
25
H
A
E
D
B C
G F
coordinator
SELECT * FROM …
WHERE ... LIMIT 1000
@doanduyhai
Caveat 2: 1-to-1 index (user_email)
26
H
A
E
D
B C
G F
coordinator
Not found WHERE user_email = ‘xxx'
@doanduyhai
Caveat 2: 1-to-1 index (user_email)
27
H
A
E
D
B C
G F
coordinator
Still no result
WHERE user_email = ‘xxx'
@doanduyhai
Caveat 2: 1-to-1 index (user_email)
28
H
A
E
D
B C
G F
coordinator
At best 1 user foundAt worst 0 user found
WHERE user_email = ‘xxx'
@doanduyhai
Caveat 2 solution: materialized views
29
For 1-to-1 index/relationship, use materialized views instead
CREATE MATERIALIZED VIEW user_by_email ASSELECT * FROM usersWHERE user_id IS NOT NULL and user_email IS NOT NULLPRIMARY KEY (user_email, user_id)
But range queries ( <, >, ≤, ≥) not possible …
@doanduyhai
Caveat 3: fetch all rows for analytics use-case
30
H
A
E
D
B C
G F
coordinator
Client
@doanduyhai
Caveat 3 solution: use co-located Spark
31
H
A
E
D
B C
G F
Local index filtering in Cassandra Aggregation in Spark
Local index query
@doanduyhai
SASI local read/write path
@doanduyhai
SASI Life-cycle: in-memory
33
Commit log1
. . .
1
Commit log2
Commit logn
Memory
. . . MemTable Table1
MemTable Table2
MemTable TableN
2
Index MemTable1
Index MemTable2
. . . Index
MemTableN 3
ACK the client
@doanduyhai
Local write path data structures
34
Index mode, data type Data structure Usage PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%'
CONTAINS, text Guava ConcurrentSuffixTree name LIKE ’%John%'name LIKE ’%ny’
PREFIX, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30
SPARSE, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30
suitable for 1-to-N index with N ≤ 5
@doanduyhai
SASI Life-cycle: flush to SSTable
35
Commit log1
. . .
1
Commit log2
Commit logn
Memory
Table1
SStable1
Table2 Table3
SStable2 SStable3 4
OnDiskIndex1
OnDiskIndex2 OnDiskIndex3
@doanduyhai
OnDiskIndex files
36
SStable1
SStable2
user_id4 FR user_id1 US user_id5 FR
user_id3 UK user_id2 DE
OnDiskIndex1
FR US
OnDiskIndex2
UK DE
B+Tree-like data structures
@doanduyhai
Local read path
37
• first, optimize query using Query Planer (see later)• then load chunks (4k) of index files from disk into memory• perform binary search to find the indexed value(s)• retrieve the offset of the partition from the original SSTable• read the original data
@doanduyhai
Query Planner
@doanduyhai
Query planner
39
• build predicates tree• predicates push-down & re-ordering• predicate fusions for != operator
@doanduyhai
Query optimization example
40
WHERE age < 100 AND fname LIKE 'p%' AND fname != 'pa%' AND age > 21
@doanduyhai
Query optimization example
41
AND is associative and commutative
@doanduyhai
Query optimization example
42
!= transformed to exclusion on range scan
@doanduyhai
Query optimization example
43
AND is associative and commutative
@doanduyhai
Take Away
@doanduyhai
Index mode caveat
Beware of disk space usage for full text search !!!Table albums with ≈ 110 000 records, 6.8Mb data size
45
@doanduyhai
SASI vs search engines
SASI vs Solr/ElasticSearch ?
• Cassandra is not a search engine !!! (database = durability) • always slower because 2 passes (SASI index read + original Cassandra data)• no scoring • no ordering (ORDER BY)• no grouping (GROUP BY) à Apache Spark for analytics
If you don’t need the above features, SASI is for you!
46
@doanduyhai
SASI sweet spots
SASI is a relevant choice if
• you need multi criteria search and you don't need ordering/grouping/scoring• you mostly need 100 to 10000 of rows for your search queries• you always know the partition keys of the rows to be searched for (this one applies to
native secondary index too)
47
@doanduyhai
SASI blind spots
SASI is a poor choice if
• you have strong SLA on search latency, for example few millisecs requirement• ordering of the search results is important for you
48
@doanduyhai 49
Q & A
! "