datastore axes choosing your scalability direction · 2019. 2. 26. · mysql – many servers each...
TRANSCRIPT
![Page 1: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/1.jpg)
Datastore Axes Choosing your scalability direction
Nicolai Plum
![Page 2: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/2.jpg)
Genesis - Autonomy ● Congratulations, Developers: you have
autonomy to design and build your own product.
● Developers must tell Database Engineering ● How their product data will grow ● How its database needs will change
![Page 3: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/3.jpg)
Genesis - Developers ● Help, I have responsibility for build my own
product successfully! ● I don’t know what I need! ● I can’t predict the future!
3
![Page 4: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/4.jpg)
Genesis - Database Engineering ● Our database services should be a
well-defined product ● Clearly defined capabilities ● … and compromises
4
![Page 5: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/5.jpg)
Datastore ● A system which stores and retrieves data ● Small data (in this talk), maybe very numerous ● Reliable and Durable ● Can have multiple components
● Data servers, routers/proxies, Orchestration, etc
5
![Page 6: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/6.jpg)
Small systems ● Everything is fine ● New products don’t have performance and
scalability issues ● No need for compromises ● 70 years of previous computer industry
development solves the problems for you 6
![Page 7: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/7.jpg)
Large systems ● Nothing is fine ● Performance problems ● Scalability problems, always ● Technology-driven design compromises are
necessary ● Established products are large systems and have
issues ● … and users, and revenue impact
7
![Page 8: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/8.jpg)
Relating to the real world (business) ● Product users come from the real world ● Ask the developers, product managers, scientists… ● Ask the accountants ● “Financial Planning”, “Management Accounting”
● Do your own business / industry research ● Yes, it’s a layering violation, but it will save you
● Keep up your contacts – don’t be isolated ● Look for fundamental controlling parameters
8
![Page 9: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/9.jpg)
Relating to the real world (tech) ● Technology also comes from the real world too … outside your control.
● Technology capabilities: ● Hardware metrics and performance
● … many other good talks at this conference ● Benchmarks, testing, learning from others, use
knowledge and experience ● Look for fundamental controlling parameters
9
![Page 10: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/10.jpg)
Capacity and load
10
![Page 11: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/11.jpg)
Capacity and load trouble
11
![Page 12: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/12.jpg)
CPU? But what about… ● Network ● 10Gbit and more, limited by CPU
(and Latency)
● Storage ● SSD and NAS, limited by CPU
(and Latency)
12
![Page 13: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/13.jpg)
Fundamentals ● Fundamental controlling parameters ● Don’t change often ● Still have to be checked ● Can be elusive
● Don’t be misled by second-order consequences
● Find the Constraints on scalability
13
![Page 14: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/14.jpg)
All of us in this together ● Datastore and application scalability problems are
connected ● Core Infrastructure teams have more experience ● Sometimes also more knowledge ● Yet we must educate developers ● … and delegate to them ● In the end, we all get paid from the same
enterprise revenues!
14
![Page 15: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/15.jpg)
Axes of Datastore scalability
![Page 16: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/16.jpg)
16
![Page 17: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/17.jpg)
Data size More… ● Rows ● Tables
● Sharding ● Data partitioning
17
![Page 18: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/18.jpg)
Data size More… ● Rows ● Tables
● Sharding ● Data partitioning
Caused by ● Business volume growth ● Analytics ● Logging ● Data retention
18
![Page 19: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/19.jpg)
Data size More… ● Rows ● Tables
● Sharding ● Data partitioning
Effects ● Non-indexed queries are
impossible – less ad-hoc reporting
● Split tables for partitioning ● Harder to query ● Client-side joins
● Less CPU per unit data
19
![Page 20: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/20.jpg)
20
![Page 21: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/21.jpg)
Schema growth More of ● Columns ● Tables ● Relations & references ● Data model complexity ● Query complexity
● Joins ● Foreign Key constraints
21
![Page 22: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/22.jpg)
Schema growth More of ● Columns ● Tables ● Relations & references ● Data model complexity ● Query complexity
● Joins ● Foreign Key constraints
22
![Page 23: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/23.jpg)
Schema growth More of ● Columns ● Tables ● Relations & references ● Data model complexity ● Query complexity
● Joins ● Foreign Key constraints
Caused by ● More customers,
products ● Product complexity ● Analytics ● Developers
…Frameworks, ORMs …Abstraction layers
23
![Page 24: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/24.jpg)
Schema growth More of ● Columns ● Tables ● Relations & references ● Data model complexity ● Query complexity
● Joins ● Foreign Key constraints
Effects ● Query optimiser stress
● Queries go bad, need tuning
● Indexing overhead ● Time and space
● FKs slow inserts ● More queries per end-
user action 24
![Page 25: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/25.jpg)
25
![Page 26: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/26.jpg)
Read query rate More… ● Queries
…on more rows
● More rows retrieved … from disc (or SSD)
● More data to sort and send
26
![Page 27: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/27.jpg)
Read query rate More… ● Queries
…on more rows
● More rows retrieved … from disc (or SSD)
● More data to sort and send
Caused by ● Business growth ● Customer behaviour
changes ● New features, interactivity,
richer website ● Read growth disconnected
from revenue
27
![Page 28: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/28.jpg)
Read query rate More… ● Queries
…on more rows
● More rows retrieved … from disc (or SSD)
● More data to sort and send
Effects ● Server CPU, IOPS
increase ● Memory cache strain ● Network traffic increase ● Need more read scale-out
● Replicas ● Copy number
28
![Page 29: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/29.jpg)
29
![Page 30: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/30.jpg)
Write query rate More… ● Transactions ● Logging, audit ● Analytics ● ETL & Data Pumps
30
![Page 31: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/31.jpg)
Write query rate More… ● Transactions ● Logging, audit ● Analytics ● ETL & Data Pumps
Caused by ● Business growth ● Curiosity, Security,
Regulation ● Richer customer
experience ● Saved preferences ● Breadcrumbs
31
![Page 32: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/32.jpg)
Write query rate More… ● Transactions ● Logging, audit ● Analytics ● ETL & Data Pumps
Effects ● Server IOPS, CPU ● Latency increase ● Contention, locking ● Replication stress
32
![Page 33: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/33.jpg)
Examples
33
![Page 34: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/34.jpg)
Notation
34
![Page 35: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/35.jpg)
Application coding ● Client-side join and filter, divide effort client- and
server-side ● Use an efficient data model
● Vectorise queries, do not iterate on Database ● Multithreaded, Asynchronous ● Parallelise (if you have to) ● Map / reduce in your app
● Fast client code
35
![Page 36: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/36.jpg)
Simplified query support ● Don’t support complex joins ● Don’t support use of known datastore weaknesses ● No Foreign Key constraints ● Enforce good indexing
● Covering secondary indexes ● Discourage pointless server load
● ORDER BY without LIMIT! ● Intensive server side aggregate functions ● Prefer client-side code over server-side code
● Compromise with developers
36
![Page 37: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/37.jpg)
Caching ● Much faster data access… ● Most of the time ● On a good day
● Bad things hide in averages Maybe that’s OK for you
37
![Page 38: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/38.jpg)
38
90% Cache Hit Rate
![Page 39: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/39.jpg)
Compression ● Storage ● Application (JSON, text, blobs; Sereal) ● Database (InnoDB Page) ● Disc array (storage controller compression)
● Network – MySQL protocol compression ● Usually huge win, but network usually OK
● Helps: Data size, Read query (a bit) ● Hinders: Updates, CPU/Latency
39
![Page 40: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/40.jpg)
Replication ● More copies of data
● MySQL – many servers each with all data, read only ● Clusters – more nodes in cluster ● Increased copy number
● More effort replicating data ● Writes: Neutral at best, bad at worst ● Especially in clusters
● May need separate read and write queries ● You should have that anyway
40
![Page 41: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/41.jpg)
Replication
41
![Page 42: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/42.jpg)
Cluster databases (Galera, Cassandra, MySQL Group Replication)
42
![Page 43: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/43.jpg)
MySQL Cluster database
43
![Page 44: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/44.jpg)
Cluster databases ● MySQL Cluster
● Huge read & write rate ● Very reliable ● Restricted data size ● No complex schema ● No complex queries
● Cassandra ● High read rate ● Reliable ● Huge data size ● (almost) No schema ● Very simple queries
44
![Page 45: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/45.jpg)
Split schemas
45
![Page 46: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/46.jpg)
Split Schemas ● (SLO) Declare maximum schema size ● Move some tables out to a new schema ● Preserve locality of reference
● Much work for developers
46
![Page 47: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/47.jpg)
Shard data
47
![Page 48: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/48.jpg)
Shard data ● Multiple data servers with data distributed between
them ● Application complexity == developer work ● Some queries much slower than others ● Auto-sharders – Vitess, Spider
● Limited query subset ● Much greater operations complexity ● Easier on developers!
● Compromise with developers
48
![Page 49: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/49.jpg)
Example: Core transaction data
49
● Requires
![Page 50: Datastore Axes Choosing your scalability direction · 2019. 2. 26. · MySQL – many servers each with all data, read only Clusters – more nodes in cluster Increased copy number](https://reader035.vdocuments.mx/reader035/viewer/2022071107/5fe1a2d1d41c301f1d3a8b59/html5/thumbnails/50.jpg)
Example: Core transaction data
50
● Requires ● Solutions? ● Split data ● Auto-sharder
(Vitess?) ● Cluster
(MySQL?)