sasi, cassandra on the full text search ride - duyhai doan - codemotion milan 2016

@doanduyhai

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra™ Evangelist Apache Zeppelin™ Commiter

MILAN 25-26 NOVEMBER 2016

@doanduyhai

Datastax

•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = better Apache Cassandra + extra features

2

@doanduyhai

1 5 minutes introduction to Apache Cassandra™ 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Take away

3

@doanduyhai

Trademark Policy

4

From now on …

Cassandra ⩵ Apache Cassandra™

@doanduyhai

5 minutes introduction to Apache Cassandra™

@doanduyhai

The tokens

6

Random hash of #partition à token = hash(#p)

Hash: ] –x, x ] hash range: 264 values

x = 264/2

C*

C*

C*

C*

C* C*

C* C*

@doanduyhai

Token ranges

7

A : −x,−3x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

B : −3x4

,− 2x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

C : −2x4

,− x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

D : − x4

,0⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

E : 0, x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

F : x4

,2x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

G : 2x4

,3x4

⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

H : 3x4

,x⎤

⎦

⎥⎥

⎤

⎦

⎥⎥

H

A

E

D

B C

G F

@doanduyhai

Distributed tables

8

H

A

E

D

B C

G F

user_id1

user_id2

user_id3

user_id4

user_id5

CREATE TABLE users( user_id int, …, PRIMARY KEY(user_id)),

@doanduyhai

Distributed tables

9

H

A

E

D

B C

G F

user_id1

user_id2

user_id3

user_id4

user_id5

@doanduyhai

Coordinator node

10

Responsible for handling requests (read/write)

Every node can be coordinator• masterless • no SPOF • proxy role

H

A

E

D

B C

G F

coordinator

request

1

2 3

@doanduyhai 11

Q & A

! "

@doanduyhai

SASI introduction

@doanduyhai

What is SASI ?

13

•  SSTable-Attached Secondary Index à new 2nd index impl that follows SSTable life-cycle

•  Objective: provide more performant & capable 2nd index

@doanduyhai

Who created it ?

14

Open-source contribution by an engineers team

@doanduyhai

Why is it better than native 2nd index ?

15

•  follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized•  new data-strutures •  range query (<, ≤, >, ≥) possible •  full text search options

@doanduyhai 16

Demo

@doanduyhai

SASI cluster-wide

@doanduyhai

Distributed index

18

On cluster level, SASI works exactly like native 2nd index

H

A

E

D

B C

G F

UK user1 user102 … user493

US user54 user483 … user938



@doanduyhai

Distributed search algorithm

19

H

A

E

D

B C

G F

coordinator

1st roundConcurrency factor = 1

@doanduyhai


20

H

A

E

D

B C

G F

coordinator

Not enough results ?

@doanduyhai


21

H

A

E

D

B C

G F

coordinator

2nd roundConcurrency factor = 2

@doanduyhai


22

H

A

E

D

B C

G F

coordinator

Still not enough results ?

@doanduyhai


23

H

A

E

D

B C

G F

coordinator

3rd roundConcurrency factor = 4

@doanduyhai

Caveat 1: non restrictive filters

24

H

A

E

D

B C

G F

coordinator

Hit all nodes

eventually L

@doanduyhai

Caveat 1 solution : always use LIMIT

25

H

A

E

D

B C

G F

coordinator

SELECT * FROM …

WHERE ... LIMIT 1000

@doanduyhai

Caveat 2: 1-to-1 index (user_email)

26

H

A

E

D

B C

G F

coordinator

Not found WHERE user_email = ‘xxx'

@doanduyhai


27

H

A

E

D

B C

G F

coordinator

Still no result

WHERE user_email = ‘xxx'

@doanduyhai


28

H

A

E

D

B C

G F

coordinator

At best 1 user foundAt worst 0 user found

WHERE user_email = ‘xxx'

@doanduyhai

Caveat 2 solution: materialized views

29

For 1-to-1 index/relationship, use materialized views instead

CREATE MATERIALIZED VIEW user_by_email ASSELECT * FROM usersWHERE user_id IS NOT NULL and user_email IS NOT NULLPRIMARY KEY (user_email, user_id)

But range queries ( <, >, ≤, ≥) not possible …

@doanduyhai

Caveat 3: fetch all rows for analytics use-case

30

H

A

E

D

B C

G F

coordinator

Client

@doanduyhai

Caveat 3 solution: use co-located Spark

31

H

A

E

D

B C

G F

Local index filtering in Cassandra Aggregation in Spark

Local index query

@doanduyhai

SASI local read/write path

@doanduyhai

SASI Life-cycle: in-memory

33

Commit log1

. . .

1

Commit log2

Commit logn

Memory

. . . MemTable Table1

MemTable Table2

MemTable TableN

2

Index MemTable1

Index MemTable2

. . . Index

MemTableN 3

ACK the client

@doanduyhai

Local write path data structures

34

Index mode, data type Data structure Usage PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%'

CONTAINS, text Guava ConcurrentSuffixTree name LIKE ’%John%'name LIKE ’%ny’

PREFIX, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30

SPARSE, other JDK ConcurrentSkipListSet age = 20age >= 20 AND age <= 30

suitable for 1-to-N index with N ≤ 5

@doanduyhai

SASI Life-cycle: flush to SSTable

35

Commit log1

. . .

1

Commit log2

Commit logn

Memory

Table1

SStable1

Table2 Table3

SStable2 SStable3 4

OnDiskIndex1

OnDiskIndex2 OnDiskIndex3

@doanduyhai

OnDiskIndex files

36

SStable1

SStable2

user_id4 FR user_id1 US user_id5 FR

user_id3 UK user_id2 DE

OnDiskIndex1

FR US

OnDiskIndex2

UK DE

B+Tree-like data structures

@doanduyhai

Local read path

37

•  first, optimize query using Query Planer (see later)•  then load chunks (4k) of index files from disk into memory•  perform binary search to find the indexed value(s)•  retrieve the offset of the partition from the original SSTable•  read the original data

@doanduyhai

Query Planner

@doanduyhai

Query planner

39

•  build predicates tree•  predicates push-down & re-ordering•  predicate fusions for != operator

@doanduyhai

Query optimization example

40

WHERE age < 100 AND fname LIKE 'p%' AND fname != 'pa%' AND age > 21

@doanduyhai


41

AND is associative and commutative

@doanduyhai


42

!= transformed to exclusion on range scan

@doanduyhai


43

AND is associative and commutative

@doanduyhai

Take Away

@doanduyhai

Index mode caveat

Beware of disk space usage for full text search !!!Table albums with ≈ 110 000 records, 6.8Mb data size

45

@doanduyhai

SASI vs search engines

SASI vs Solr/ElasticSearch ?

•  Cassandra is not a search engine !!! (database = durability) •  always slower because 2 passes (SASI index read + original Cassandra data)•  no scoring •  no ordering (ORDER BY)•  no grouping (GROUP BY) à Apache Spark for analytics

If you don’t need the above features, SASI is for you!

46

@doanduyhai

SASI sweet spots

SASI is a relevant choice if

•  you need multi criteria search and you don't need ordering/grouping/scoring•  you mostly need 100 to 10000 of rows for your search queries•  you always know the partition keys of the rows to be searched for (this one applies to

native secondary index too)

47

@doanduyhai

SASI blind spots

SASI is a poor choice if

•  you have strong SLA on search latency, for example few millisecs requirement•  ordering of the search results is important for you

48

@doanduyhai 49

Q & A

! "

@doanduyhai 50

Thank You @doanduyhai

[email protected]

sasi, cassandra on the full text search ride - duyhai doan - codemotion milan 2016

Technology