datastore axes choosing your scalability direction · 2019. 2. 26. · mysql – many servers each...

51
Datastore Axes Choosing your scalability direction Nicolai Plum

Upload: others

Post on 30-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Datastore Axes Choosing your scalability direction

Nicolai Plum

Page 2: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Genesis - Autonomy ● Congratulations, Developers: you have

autonomy to design and build your own product.

● Developers must tell Database Engineering ● How their product data will grow ● How its database needs will change

Page 3: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Genesis - Developers ● Help, I have responsibility for build my own

product successfully! ●  I don’t know what I need! ●  I can’t predict the future!

3

Page 4: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Genesis - Database Engineering ● Our database services should be a

well-defined product ● Clearly defined capabilities ● … and compromises

4

Page 5: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Datastore ● A system which stores and retrieves data ● Small data (in this talk), maybe very numerous ● Reliable and Durable ● Can have multiple components

● Data servers, routers/proxies, Orchestration, etc

5

Page 6: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Small systems ● Everything is fine ● New products don’t have performance and

scalability issues ● No need for compromises ● 70 years of previous computer industry

development solves the problems for you 6

Page 7: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Large systems ●  Nothing is fine ●  Performance problems ●  Scalability problems, always ●  Technology-driven design compromises are

necessary ●  Established products are large systems and have

issues ● … and users, and revenue impact

7

Page 8: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Relating to the real world (business) ●  Product users come from the real world ●  Ask the developers, product managers, scientists… ●  Ask the accountants ●  “Financial Planning”, “Management Accounting”

●  Do your own business / industry research ●  Yes, it’s a layering violation, but it will save you

●  Keep up your contacts – don’t be isolated ●  Look for fundamental controlling parameters

8

Page 9: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Relating to the real world (tech) ●  Technology also comes from the real world too … outside your control.

●  Technology capabilities: ●  Hardware metrics and performance

● … many other good talks at this conference ●  Benchmarks, testing, learning from others, use

knowledge and experience ●  Look for fundamental controlling parameters

9

Page 10: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Capacity and load

10

Page 11: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Capacity and load trouble

11

Page 12: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

CPU? But what about… ● Network ● 10Gbit and more, limited by CPU

(and Latency)

● Storage ● SSD and NAS, limited by CPU

(and Latency)

12

Page 13: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Fundamentals ● Fundamental controlling parameters ● Don’t change often ● Still have to be checked ● Can be elusive

● Don’t be misled by second-order consequences

● Find the Constraints on scalability

13

Page 14: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

All of us in this together ●  Datastore and application scalability problems are

connected ●  Core Infrastructure teams have more experience ●  Sometimes also more knowledge ●  Yet we must educate developers ● … and delegate to them ●  In the end, we all get paid from the same

enterprise revenues!

14

Page 15: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Axes of Datastore scalability

Page 16: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

16

Page 17: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Data size More… ●  Rows ●  Tables

●  Sharding ●  Data partitioning

17

Page 18: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Data size More… ●  Rows ●  Tables

●  Sharding ●  Data partitioning

Caused by ●  Business volume growth ●  Analytics ●  Logging ●  Data retention

18

Page 19: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Data size More… ●  Rows ●  Tables

●  Sharding ●  Data partitioning

Effects ●  Non-indexed queries are

impossible – less ad-hoc reporting

●  Split tables for partitioning ●  Harder to query ●  Client-side joins

●  Less CPU per unit data

19

Page 20: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

20

Page 21: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Schema growth More of ●  Columns ●  Tables ●  Relations & references ●  Data model complexity ●  Query complexity

●  Joins ●  Foreign Key constraints

21

Page 22: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Schema growth More of ●  Columns ●  Tables ●  Relations & references ●  Data model complexity ●  Query complexity

●  Joins ●  Foreign Key constraints

22

Page 23: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Schema growth More of ●  Columns ●  Tables ●  Relations & references ●  Data model complexity ●  Query complexity

●  Joins ●  Foreign Key constraints

Caused by ●  More customers,

products ●  Product complexity ●  Analytics ●  Developers

…Frameworks, ORMs …Abstraction layers

23

Page 24: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Schema growth More of ●  Columns ●  Tables ●  Relations & references ●  Data model complexity ●  Query complexity

●  Joins ●  Foreign Key constraints

Effects ●  Query optimiser stress

●  Queries go bad, need tuning

●  Indexing overhead ●  Time and space

●  FKs slow inserts ●  More queries per end-

user action 24

Page 25: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

25

Page 26: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Read query rate More… ●  Queries

…on more rows

●  More rows retrieved … from disc (or SSD)

●  More data to sort and send

26

Page 27: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Read query rate More… ●  Queries

…on more rows

●  More rows retrieved … from disc (or SSD)

●  More data to sort and send

Caused by ●  Business growth ●  Customer behaviour

changes ●  New features, interactivity,

richer website ●  Read growth disconnected

from revenue

27

Page 28: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Read query rate More… ●  Queries

…on more rows

●  More rows retrieved … from disc (or SSD)

●  More data to sort and send

Effects ●  Server CPU, IOPS

increase ●  Memory cache strain ●  Network traffic increase ●  Need more read scale-out

●  Replicas ●  Copy number

28

Page 29: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

29

Page 30: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Write query rate More… ●  Transactions ●  Logging, audit ●  Analytics ●  ETL & Data Pumps

30

Page 31: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Write query rate More… ●  Transactions ●  Logging, audit ●  Analytics ●  ETL & Data Pumps

Caused by ●  Business growth ●  Curiosity, Security,

Regulation ●  Richer customer

experience ●  Saved preferences ●  Breadcrumbs

31

Page 32: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Write query rate More… ●  Transactions ●  Logging, audit ●  Analytics ●  ETL & Data Pumps

Effects ●  Server IOPS, CPU ●  Latency increase ●  Contention, locking ●  Replication stress

32

Page 33: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Examples

33

Page 34: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Notation

34

Page 35: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Application coding ●  Client-side join and filter, divide effort client- and

server-side ●  Use an efficient data model

●  Vectorise queries, do not iterate on Database ●  Multithreaded, Asynchronous ●  Parallelise (if you have to) ● Map / reduce in your app

●  Fast client code

35

Page 36: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Simplified query support ●  Don’t support complex joins ●  Don’t support use of known datastore weaknesses ●  No Foreign Key constraints ●  Enforce good indexing

●  Covering secondary indexes ●  Discourage pointless server load

●  ORDER BY without LIMIT! ●  Intensive server side aggregate functions ●  Prefer client-side code over server-side code

●  Compromise with developers

36

Page 37: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Caching ● Much faster data access… ●  Most of the time ●  On a good day

● Bad things hide in averages Maybe that’s OK for you

37

Page 38: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

38

90% Cache Hit Rate

Page 39: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Compression ●  Storage ●  Application (JSON, text, blobs; Sereal) ●  Database (InnoDB Page) ●  Disc array (storage controller compression)

●  Network – MySQL protocol compression ●  Usually huge win, but network usually OK

●  Helps: Data size, Read query (a bit) ●  Hinders: Updates, CPU/Latency

39

Page 40: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Replication ●  More copies of data

●  MySQL – many servers each with all data, read only ●  Clusters – more nodes in cluster ●  Increased copy number

●  More effort replicating data ●  Writes: Neutral at best, bad at worst ●  Especially in clusters

●  May need separate read and write queries ●  You should have that anyway

40

Page 41: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Replication

41

Page 42: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Cluster databases (Galera, Cassandra, MySQL Group Replication)

42

Page 43: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

MySQL Cluster database

43

Page 44: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Cluster databases ●  MySQL Cluster

●  Huge read & write rate ●  Very reliable ●  Restricted data size ●  No complex schema ●  No complex queries

●  Cassandra ●  High read rate ●  Reliable ●  Huge data size ●  (almost) No schema ●  Very simple queries

44

Page 45: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Split schemas

45

Page 46: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Split Schemas ●  (SLO) Declare maximum schema size ● Move some tables out to a new schema ● Preserve locality of reference

● Much work for developers

46

Page 47: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Shard data

47

Page 48: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Shard data ●  Multiple data servers with data distributed between

them ●  Application complexity == developer work ●  Some queries much slower than others ●  Auto-sharders – Vitess, Spider

●  Limited query subset ●  Much greater operations complexity ●  Easier on developers!

●  Compromise with developers

48

Page 49: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Example: Core transaction data

49

●  Requires

Page 50: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number

Example: Core transaction data

50

●  Requires ●  Solutions? ●  Split data ●  Auto-sharder

(Vitess?) ●  Cluster

(MySQL?)