scylla summit 2017: planning your queries for maximum performance

38
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Planning your queries for maximum performance VP R&D, ScyllaDB Shlomi Livne

Upload: scylladb

Post on 22-Jan-2018

831 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Planning your queries for maximum performance

VP R&D, ScyllaDB

Shlomi Livne

Page 2: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Shlomi Livne

2

Shlomi is VP of R&D at ScyllaDB. Prior to ScyllaDB

he led the research and development team at

Convergin, which was acquired by Oracle.

Page 3: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

How Scylla executes your queries

Page 4: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Cluster View

4

client Cluster of nodes1

7

3

4

5

68

2

Coordinator

Replica

Page 5: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Coordinator Tasks

5

1. Prepare the statement

2. Single partition queriesa. Selects replicas (using cache heat info) - and send query / digest requests

requesting a page of results b. Compare the digests, if there is a mismatch:

i. Request data from selected replicasii. Repair the data on replicas

c. Return result

3. Partition scan queriesa. Split the request up based on the ringb. Send requests for data using ranges - requesting a page of resultsc. Merge resultsd. Return result

Page 6: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Replica Tasks

6

1. Receive a data/digest/range request

2. Split the request up according to shards

3. On each shard:a. Execute the request merging data from memtables + cache/sstablesb. For data request:

i. prepare a result and return it (compute digest if RF > 1)c. For digest request:

i. compute digest and return itd. For partition scan request

i. return the partition range data (do not prepare a result)

Page 7: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

7

Bloom Filter Summary Index Compression Data

Bloom Filter Summary Index Compression Data

Bloom Filter Summary Index Compression Data

ResultRow CacheMemtable

Read Req Result

Bloom Filter Summary Index Compression Data

Page 8: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

8

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheP8:R1:A=8,B=7

MemtableP8:R1:C=3

Read: P8:R1

Bloom Filter Summary Index Compression Data

Page 9: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

9

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheP8:R1:A=8,B=7

MemtableP8:R1:C=3

Read: P8:R1 P8:R1A=8,B=7,C=3

Bloom Filter Summary Index Compression Data

Page 10: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

10

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheMemtableP8:R1:C=3

Read: P8:R1

Bloom Filter Summary Index Compression Data

Page 11: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Bloom Filter

emtableP8:R1:C=3

Replica Shard Read Diagram

11

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheMemtableP8:R1:C=3

Read: P8:R1

Summary Index Compression Data

Page 12: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

12

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheMemtableP8:R1:C=3

Read: P8:R1

Bloom Filter 12Summary Index Compression Data

Page 13: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

13

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Row CacheMemtableP8:R1:C=3

Read: P8:R1

13

Bloom Filter 13Summary Index Compression Data

Page 14: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

14

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom Filter Summary Index Compression Data

Row CacheMemtableP8:R1:C=3

Read: P8:R1

Bloom FilterP8 Summary Index Compression Data

Page 15: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

15

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom Filter Summary Index Compression Data

P8:R1:A=8,B=7Row CacheMemtableP8:R1:C=3

Read: P8:R1

Bloom FilterP8 Summary Index Compression Data

Page 16: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

16

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom Filter Summary Index Compression Data

P8:R1:A=8,B=7Row CacheP8:R1:A=8,B=7

MemtableP8:R1:C=3

Read: P8:R1 P8:R1A=8,B=7,C=3

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom FilterP8 Summary Index Compression Data

Page 17: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

emtableP8:R1:C=3

Replica Shard Read Diagram

17

Bloom FilterP8

SummaryP8

IndexP8

Compression DataP8:R1:A=8

Bloom FilterP8 Summary Index

P8Compression Data

P8:R1:B=7

Bloom Filter Summary Index Compression Data

P8:R1:A=8,B=7Row CacheP8:R1:A=8,B=7

MemtableP8:R1:C=3

Read: P8:R1 P8:R1A=8,B=7,C=3

Bloom FilterP8 Summary Index Compression Data

Page 18: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Row Cache

18

▪ Cache stores complete row data

▪ In addition to storing existing rows, cache stores information

about completeness of clustering ranges (continuity), so it doesn't

miss between cached rows.

▪ Cache is populated on:o Querieso Memtable flush:

• Data is merged - to keep it up to date with new sstables written.• Data is inserted - in case there is no data for that partition on disk.

Page 19: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Selecting Sstables

19

▪ Given a partition key (pk), the current set of sstables is reduced so that

sstable X will be included iff:o min_partition_key(sstable X) < pk < max_partition_key (sstable X)

o bloom_filer (sstable X, pk) = True

▪ Scylla 2.0: SStables will be read in parallel

▪ Scylla 2.1:o The reduced set of sstables is searched newest to oldest until a result can be

constructed and we can prove that older sstables are not relevant.o SStables read parallelism will grow starting from a single sstable

Page 20: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

7 Rules To Optimize your Queries

Page 21: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #1 - Use Prepared statements

▪ Coordinator needs to pre-process the query:o A lot of repetitive work that can be done only once

o Adds overhead in execution of a query - directly translates to throughput and

latency

▪ Driver is not able to send the request to a coordinator node that

holds the data (an additional hop)

▪ tip: compare scylla_query_processor_statements_prepared to the

# of executed scylla_transport_requests_served

21

Page 22: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Sample: single Scylla server, using c-s

22

Results Unprepared Prepared

op rate 13037 18704

partition rate 13037 18704

row rate 13037 18704

latency mean 1.5 1.1

latency median 1.3 1

latency 95th percentile 2.9 1.6

latency 99th percentile 6.2 2.5

latency 99.9th percentile 12.2 7.1

latency max 31.1 16.9

Total partitions 100000 100000

Page 23: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #2 - Use Paging

▪ Paging Disabled: Coordinator will be forced to prepare a single

result that holds all the data and send it back:o If coordinator is not able to return a response (allocate enough memory for

the single result) an error will be returned to the cliento tip: compare scylla_transport_unpaged_queries to scylla_cql_reads to

detected if many of your read queries are unpaged

23

Page 24: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #3 - Use correct Page Size

▪ Drivers enable paging by default with a default page_size 5000

rows (java, python, gocql)

▪ CQL requires returning at least one result and allows returning less

results than the page size

▪ Scylla utilizes this:o Scylla caps a page_size to ~1MB of memory - Scylla will return less rows than

requested when rows are largeo Do not use the number of returned results as indication if there are no more

results

24

Page 25: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

25

21

Has more pages

Page 26: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Scylla 2.0: does the default page_size make sense

26

page size 10^6 rows of 100 bytes 10^5 rows of 1000 bytes 10^4 rows of 10^4 bytes 1000 rows of 10^5 bytes10 timed out 2104.492031 331.087871 173.93254350 5679.087615 737.148927 202.113023 168.165375

100 4034.920447 573.046783 186.384383 168.951807500 2663.383039 415.760383 183.894015 173.015039

1000 2451.570687 395.313151 182.976511 168.4275195000 2285.895679 400.031743 184.942591 169.345023

10000 2281.701375 399.769599 183.369727 169.73823950000 2273.312767 396.099583 183.107583 170.000383

Test: duration in millisecond fetching a single wide partition with 10^8 bytes

split into rows using different page size

Page 27: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Test: duration in millisecond fetching a single wide partition with 10^8 bytes

split into rows using different page size

C* 3.11.0: does the default page_size make sense

27

page size 10^6 rows of 100 bytes 10^5 rows of 1000 bytes 10^4 rows of 10^4 bytes 1000 rows of 10^5 bytes10 timed out 4030.726143 903.872511 364.38015950 12876.51328 1535.115263 419.430399 300.941311

100 8992.587775 1202.716671 405.274623 316.407807500 6400.507903 907.542527 354.680831 348.651519

1000 6077.546495 874.512383 360.972287 370.4094715000 5620.367359 791.674879 422.051839 358.612991

10000 5490.343935 793.772031 389.021695 360.44799950000 5662.310399 913.833983 383.516671 355.467263

tip: consider changing the page size if your rows are large

Page 28: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #4 - Beware of Multi Partition CQL IN queries

▪ Multi-Partition CQL IN queries: force the coordinator node to split

the queries up to single partition queries and aggregate results.

28

Page 29: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #5 - Beware of Single Partition CQL IN queries

Question: Should I split the CQL IN Query ?

Sample:

▪ CQL: “Select * from ks.cf where pk = X and ck in (Y1, Y2, … Yn)

Translated to:

▪ CQL: o “Select * from ks.cf where pk = X and ck = Y1“ o “Select * from ks.cf where pk = X and ck = Y2“

.

o “Select * from ks.cf where pk = X and ck = Yn“

29

Page 30: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

30

Page 31: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

31

Page 32: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

32

Page 33: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

33

Page 34: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Question: Should I split the CQL IN Query ?

Answer: It depends on how wide your rows are

Comments:

▪ Prior to Scylla-2.0 in some wide partition cases single partition CQL

IN Queries - performed very badly.

▪ All reported results are using Scylla 2.0

34

Page 35: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #6 - There’s a faster way todo full scans

▪ The blog post efficient-full-table-scans-with-scylla outlaid an

algorithm todo full scans; in highlevel:o split the range up into small sub ranges

o run “enough” sub ranges in parallel

▪ In follow up blog How to scan 475 million partitions 12x faster

using efficient full table scan a sample implementation applying

this was provided

▪ Is there even a “faster” way ?

35

Page 36: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

▪ Yes there is:o Using the token ownership of nodes in the ring one can select ranges of

tokens. Once a “range” has been processed - the next “range” can be selected based on the ownership in the ring.

o An even more optimized solution would use the “sharding” information and aim ranges based on shards on a machine - so that all cores are executing requests in parallel.

36

Page 37: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Rule #7: Use the tools ….

▪ Probelastic tracing

▪ Slow query tracing

▪ Wireshark

▪ CQL Trace

▪ Enable Client Side tracing.

37

Page 38: Scylla Summit 2017: Planning Your Queries for Maximum Performance

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

THANK YOU

[email protected]

@ShlomiLivne

Any questions?