self-driving databases & my pregnant wifepavlo/slides/selfdriving-tour2019.pdf · online...

Self-Driving Databases &My Pregnant Wife:The Hard Parts

@andy_pavlo

https://twitter.com/andy_pavlo

https://db.cs.cmu.edu/

I never took sex-ed.Everything I know is from a 1993 episode of Perfect Strangers.

I never took a ML course.Everything I know is from YouTube, research papers, and blogs.

Autonomous Database Systems

Remove the burden of managing DBMSs from humans.

Target Tasks:Physical Database Design

Knob Configuration Tuning

Query Plan Tuning

Capacity Planning / Hardware Provisioning

4


1970s: Self-Adaptive Systems

1990s: Self-Tuning Systems

2010s: Learned Components

5





5

https://doi.org/10.1145/509383.509385





5

https://doi.org/10.1145/509383.509385

https://dl.acm.org/citation.cfm?id=1325856

6

LocationKey

Neural Net

LearnedIndexes

B+Tree

Key

Location

6

Transaction

LocationKey

Neural Net

LearnedIndexes

LearnedScheduling

B+Tree

Key

Location

PartitioningScheme

6

Transaction

LocationKey

Neural Net

LearnedIndexes

LearnedScheduling

Un

supe

rvis

edC

lust

erin

g

B+Tree

Key

Location

6

Transaction SELECT *FROM A, B, CWHERE A.id = B.idAND A.id = C.id

A⨝ A.id=B.id

B C

A.id=C.id ⨝Histograms

Sketches

Neural NetLocationKey

Neural Net

LearnedIndexes

LearnedScheduling

LearnedPlanning

Un

supe

rvis

edC

lust

erin

g

B+Tree

Key

Location

6

Transaction SELECT *FROM A, B, CWHERE A.id = B.idAND A.id = C.id

A⨝ A.id=B.id

B C

A.id=C.id ⨝Histograms

Sketches

Neural NetLocationKey

Neural Net

LearnedIndexes

LearnedScheduling

LearnedPlanning

Un

supe

rvis

edC

lust

erin

g

B+Tree

Key

Location

https://dl.acm.org/citation.cfm?id=672349

Self-Driving DBMS

A system that configures, manages, and optimizes itself for the target database and its workload without any human input beyond the initialization parameters.

Objective Function (throughput, latency, cost, energy)

Constraints (SLOs, resources, cost)

The system is not allowed to make any change that requires a human to make a value judgement.

7

Environment

Objective

Constraints

Action

RewardState

Agent

Self-Driving DBMS Tenets

➊ Explore unknown configurations.

➋ Generalize over environments / applications / hardware.

➌ Reason about delayed consequences.

9

The Hard Parts

State Modeling

Training Data Collection

Reward Observation

10

1. STATE MODELINGSuccinctly & accurately represent the environment state.

StochasticCannot easily look inside to collect more data.

Non-StationaryPreferences and habits change from one day to the next.

EpisodicPregnancy is over after nine months.

StochasticCannot exactly model the DBMS's internals.

Non-StationaryReward of an action can change over time.

Non-EpisodicDBMS must run forever. No terminating states.

Markov Decision Process Models

Not possible to know all possible DBMS states or transition probabilities ahead of time.

Encode entire history of the DBMS in its state vector:Database Contents

Database Physical Design

Workload

Knob Configuration

Hardware Resources

15

Database Contents State

16

CREATE TABLE xxx (col1 INT PRIMARY KEY,col2 LONG NOT NULL,col3 INT DEFAULT NULL,col4 VARCHAR(8) NOT NULL

);

INSERT INTO xxxSELECT x.id,

99,(RANDOM()*1000)::int,SUBSTRING(MD5(id::text),1,1))

FROM generate_series(1, 1000) AS x (id);


16

1000,4,CREATE TABLE xxx (

col1 INT PRIMARY KEY,col2 LONG NOT NULL,col3 INT DEFAULT NULL,col4 VARCHAR(8) NOT NULL

);

INSERT INTO xxxSELECT x.id,

99,(RANDOM()*1000)::int,SUBSTRING(MD5(id::text),1,1))

FROM generate_series(1, 1000) AS x (id);

Max Size

Avg Size

Min Val

Max Val

Cardinality

⮱2,8,8,99,99,1,

⮱1,4,4,0,997,630,

⮱3,8,1,'0','f',16,

⮱1,4,4,1,1000,1000,

# of Tuples # of Attributes

Type

Table State


16



);Max Size

Avg Size

Min Val

Max Val

Cardinality

⮱2,8,8,99,99,1,

⮱1,4,4,0,997,630,

⮱3,8,1,'0','f',16,

⮱1,4,4,1,1000,1000,


Type

Table State

Add/Drop Indexes?

Change Hardware?

Alter Physical Layout?

Tune Knobs?


16



);Max Size

Avg Size

Min Val

Max Val

Cardinality

⮱2,8,8,99,99,1,

⮱1,4,4,0,997,630,

⮱3,8,1,'0','f',16,

⮱1,4,4,1,1000,1000,


Type

Table StateALTER TABLE xxx ADD COLUMN col5 INT;

Add/Drop Indexes?

Change Hardware?


Tune Knobs?


16



);

⮱2,8,8,99,99,1,

⮱1,4,4,0,997,630,

⮱3,8,1,'0','f',16,

⮱1,4,4,1,1000,1000,


Table StateALTER TABLE xxx ADD COLUMN col5 INT;

Add/Drop Indexes?

Change Hardware?


Tune Knobs?


17

SELECT x1.col1, COUNT(*)FROM xxx AS x1JOIN xxx AS x2ON x1.col1 > x2.col3

GROUP BY x1.col1;

Add Index

Agent


17


GROUP BY x1.col1;

Expected: 100

Estimated State

Add Index

Agent

Actual States


17


GROUP BY x1.col1;

Expected: 100

Actual: -10

Estimated State

Add Index

Agent

Actual States


17


GROUP BY x1.col1;

Expected: 100

Actual: -10

Estimated State

Add Index

Agent

Actual States

ANALYZE;

Knob Configuration State

18

Knob Configurations

16, 2, 4,…# Exec Threads

Query Cache Size

Parallel Query Threads

# of Background Threads

20GB, 8GB, 500MB,…Buffer Pool Size Join Buffer Size


18

Knob Configurations


Query Cache Size



20GB, 8GB, 500MB,…Buffer Pool Size Join Buffer Size32GB


18

Knob Configurations


Query Cache Size



20GB, 8GB, 500MB,…Buffer Pool Size Join Buffer Size32GB 128GB


18

Knob Configurations


Query Cache Size




CREATE BUFFERPOOL new_bpBLOCKSIZE 64PAGESIZE 8192;

64, 8192,…

32GB 128GB


18

Knob Configurations


Query Cache Size




CREATE BUFFERPOOL new_bpBLOCKSIZE 64PAGESIZE 8192;

CREATE BUFFERPOOL new_bp2 BLOCKSIZE 128PAGESIZE 4096;

64, 8192,…128, 4096,…

32GB 128GB

Potential Solutions

➊ Reduce model dimensionality to avoid variability.Examples: Hashing Kernels, Auto-Encoders, PCA

These exacerbate instability and destroy information.

➋ Store hardware resource knobs as percentages relative to the available amount.

➌ Use hierarchical models that isolate DBMS components to reduce the propagation of changes.

19

2. DATA COLLECTIONObserve the system under varying scenarios to gain experience.

Training Data AcquisitionCan only make a limited number of observations. Need

Offline AcquisitionTargeted experiments on individual components.

Online AcquisitionCollect metrics and data from the system running in production.

Offline Training Data

➊ End-to-End BenchmarksRun sample workloads using the full DBMS.

23

Logging

Storage

Execution

Planning MetricsSQL

Agent

BenchmarkDriver



➋ Component TrainersRun microbenchmarks on a subset of components.

23

Storage

MetricsSQL

Agent

BenchmarkDriver

Microbench

API



➋ Component TrainersRun microbenchmarks on a subset of components.

23

MetricsSQL

Agent

BenchmarkDriver

Microbench

APIStorage v2

Offline Collection Logging

24

0

5

10

15

20

100 10,000 1,000,000

End-to-End (YCSB)

La

ten

cy (

ms )

Log Buffer Size (bytes)

0

5

10

15

20

100 10,000 1,000,000

Microbenchmark

Log Buffer Size (bytes)

40,000 Observations 1,000 Observations

Online Training Data

Collect data from the DBMS running in production.

We must be more conservative about action exploration.Cannot allow for any perceivable effect to applications / users.

No DBMS can fully support this today (e.g., mandatory restarts).

25


26

Master Replica

Replica

Actions

Actions

Agent


26

Master Replica

Replica

Actions Actions

Actions

Agent


26

Master Replica

Replica

Failover Time?

Actions

Actions

Agent


26

Master Replica

Replica

Failover Time?

Actions

Actions

Agent

Slower

Faster


26

MasterReadsWrites

Writes

Writes

Replica

Replica

App Server

Agent


26

MasterReadsWrites

Physical Log?

SQL Statements?

Replica

ReplicaPhysical Log?

SQL Statements?

App Server

Agent


26

MasterReadsWrites

Physical Log?

SQL Statements?

Reads

Reads

Replica


SQL Statements?

App Server

Agent


26

MasterReadsWrites

Physical Log?

SQL Statements?

Replica


SQL Statements?

App Server

Agent

Reads?

Reads?


26

Master Replica

Replica

App Server

Actions

Agent

Alternative: Imitation Learning

Observe an expert tuning the DBMS and then try to train the system to mimic those decisions.

Still need to capture the state of the DBMS before each change to extrapolate why the DBA made the change.

Unlikely to work well because training data will be sparse/noisy.

27

3. REWARD OBSERVATIONHow to identify the reward of a deployed action.

Short-Term RewardsWe immediately know when something is wrong.

Long-Term RewardsDifficult to determine whether we are doing the correct things now for our child in the future.

Short-Term RewardsWe immediately know when something is wrong.

Long-Term RewardsDifficult to determine whether we are doing the correct things now for our child in the future.

https://www.legis.state.pa.us/cfdocs/legis/li/uconsCheck.cfm?yr=2014&sessInd=0&act=153

Short-Term RewardsMust immediately detect whether the DBMS violates constraints.

Long-Term RewardsIdentify long-term workload trends and how long it will take to prepare accordingly.

Deployment Costs

The DBMS must predict how much time and resources it will take to deploy each action.

Assume that the system only deploys one action at a time.

The cost of deploying actions can affect the objective function.

31

Compress Table

Agent

Variable Action Resources

Hardware Degradations

Transient hardware problems are difficult to detect and could mask the true reward of an action.

Likely more problematic in distributed / shared-disk systems.

Non-Elegant Solutions:Run periodic microbenchmarks and compare with historical data.

Train classifiers on hardware stats (e.g., SMART).

Tweak exploration policy to reconsider recent actions.

32

https://www.kaggle.com/backblaze/hard-drive-test-data

Reward Magnitudes

The importance of changes in the objective function depends on the workload context.

May need multiple policies to prefer one workload type over another in mixed environments.

33

OLTP Workload

Latency:100ms → 50ms

OLAP Workload


Reward Magnitudes



33

Mixed Workload


Latency:500ms → 250ms -50%

-50%

Reward Magnitudes



33

Mixed Workload

Latency:100ms → 50ms -50%

Other Hard Parts

Other Hard Parts

➊ Action Selection Policies

➋ Transparency / Explainability

➌ Human Interaction

35

Weren't you building a DBMS at CMU?

Self-Driving DBMS Project

→ Latch-free MVCC

→ Hybrid Storage (Row + Column)

→ LLVM Execution Engine

→ Latch-free Bw-Tree

→ Postgres Compatible

→ Open-Source (Apache)

Pelotonpelotondb.io

https://ottertune.cs.cmu.edu/

http://pelotondb.io/

New System (2019)

→ Training & Modeling as First-Class Concepts

→ Zero Restart Action Deployments

→ Latch-free MVCC

→ Native Apache Arrow Storage

→ MemSQL-style LLVM Execution Engine

→ Postgres Compatible

→ Open-Source (MIT)

40

Self-Driving DBMS Gym

Building a simulator is just as hard as building a real DBMS.

No existing DBMS has the sufficient control APIs, metrics, and infrastructure to support rapid experimentation.

We are attempting to build the first DBMS "gym" to make it easier for ourselves and others to develop new ML methods for autonomous control.

41

Final Thoughts

End-to-end reinforcement learning is unlikely to work.Will require an impossibly large amount of training data.

No existing way to model infinite action spaces.

There are engineering efforts that existing DBMSs should pursue now to support automation in the future.

42

44

END@andy_pavlo

https://twitter.com/andy_pavlo

self-driving databases & my pregnant wifepavlo/slides/selfdriving-tour2019.pdf · online...

Documents