database performance tuning artificial intelligence · 2/17/2016  · selection of optimal...

37
Artificial Intelligence Database Performance Tuning Roel Van de Paar Percona

Upload: others

Post on 23-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

Artificial IntelligenceDatabase Performance Tuning

Roel Van de PaarPercona

Page 2: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

2

Agenda

● GA: How it works, terminology, variables, example

● Database Tuning & Surrounding thoughts

● gaai

● POC

● Results

Page 3: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

3

Define: GA

A Genetic Algorithm (GA) is an lightweight Artificial Intelligence (AI) evolutionary algorithm which mimics Darwin’s theory of natural evolution.

Page 4: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

4

GA Terminology

Population (inc. any offspring)

vChromosomes (“Individuals”)

vGenes (“Chromosome Length”)

Page 5: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

5

GA: it’s all about the genes

2 Parentsv

Children

Children may get;● Genes mixed from parents (“crossover”)● Modified (“mutated”) genes

Page 6: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

6

GA LoopPopulation is created ‘randomly’ (can be pre-populated)

v[loop]>

Population is evaluated (i.e. each individual receive a fitness value)v

A new population is created:Population can be sorted / kept or discarded in part (“selection”)

Crossover, gene mutations etc.v

possible intermediary re-eval<[loop]

Page 7: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

7

GA FitnessA fitness value is the result of a chosen fitness function

As a rather simple/limited example: FITNESS = RAND(A) + RAND(B) + RAND(C)

where A,B,C are 0-1000: Highest fitness value=3000, lowest=0

Optimize towards a negative (lowest value=best) calculated fitness; FITNESS=-FITNESS

i.e. -3000 becomes 3000 so the lowest value becomes best

Optimize towards %: 1/FITNESS or 1-(1/FITNESS) etc.

Basically; anything that can be optimized towards a best result can be GA’ed

Page 8: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

8

GA VariablesGA Variables are often binary (or represented in binary)

where a single bit is a single gene

But they do not need to be!

One step further is variables as genes, where all variables are alikea=0-100 with step 1, b=0-100 with step 1, c=0-100 with step 1

The most advanced is variables that are in disparate rangesa=-1 to 1 with step 0.01, b=0-100 with step 0.5, etc.

Page 9: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

9

GA Dev TIP: GA Value-store GenesEveryone working with GA can benefit from this hack/approach

Example: significant genes (used in fitness): a,b,c

non-significant genes (not used in fitness): d

i.e. sub-eval (think sub-total) data can be stored in another genewhere such gene is never set/updated/mutated, but only used

for tracking certain calculations, results, statuses, etc.

This optimizes (though not in all cases) the number of calculations

Page 10: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

10

So why do we need GA’s?

To optimize...

everything

Page 11: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

11

How manypersonscan we fit...

Page 12: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

12

GA Application Domains #1Bayesian inference links to particle methods in Bayesian statistics and hidden Markov chain modelsArtificial creativityChemical kinetics (gas and solid phases)Calculation of bound states and local-density approximationsCode-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption.Computer architecture: using GA to find out weak links in approximate computing such as lookahead.Configuration applications, particularly physics applications of optimal molecule configurations for particular systems like C60 (buckyballs)Construction of facial composites of suspects by eyewitnesses in forensic science.Data Center/Server Farm.Distributed computer network topologiesElectronic circuit design, known as evolvable hardwareFeature selection for Machine LearningFeynman-Kac modelsFile allocation for a distributed systemFiltering and signal processingFinding hardware bugs.Game theory equilibrium resolutionGenetic Algorithm for Rule Set ProductionScheduling applications, including job-shop scheduling and scheduling in printed circuit board assembly.Learning robot behavior using genetic algorithmsImage processing: Dense pixel matchingLearning fuzzy rule base using genetic algorithmsMolecular structure optimization (chemistry)Optimisation of data compression systems, for example using wavelets.Power electronics design. SOURCE: https://en.wikipedia.org/wiki/List_of_genetic_algorithm_applicationsTraveling salesman problem and its applications

Page 13: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

13

GA Application Domains #2Climatology: Estimation of heat flux between the atmosphere and sea iceClimatology: Modelling global temperature changesDesign of water resource systemsGroundwater monitoring networksDesign of anti-terrorism systemsLinguistic analysis, including grammar induction and other aspects of Natural language processing (NLP) such as word sense disambiguation.Automated design of sophisticated trading systems in the financial sectorRepresenting rational agents in economic models such as the cobweb modelReal options valuationAudio watermark insertion/detectionAirlines revenue managementAutomated design of mechatronic systems using bond graphs and genetic programming (NSF)Automated design = computer-automated designAutomated design of industrial equipment using catalogs of exemplar lever patternsAutomated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and other characteristicsContainer loading optimizationControl engineering,Marketing mix analysisMechanical engineeringMobile communications infrastructure optimization.Plant floor layoutPop music record productionQuality controlTimetabling problems, such as designing a non-conflicting class timetable for a large universityVehicle routing problemOptimal bearing placement

Page 14: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

14

GA Application Domains #3Computer-automated designBioinformatics Multiple Sequence AlignmentBioinformatics: RNA structure predictionBioinformatics: Motif DiscoveryBiology and computational chemistryBuilding phylogenetic trees.Gene expression profiling analysis.Medicine: Clinical decision support in ophthalmologyComputational Neuroscience: finding values for the maximal conductances of ion channels in biophysically detailed neuron modelsProtein folding and protein/ligand dockingSelection of optimal mathematical model to describe biological systemsOperon prediction.Neural Networks; particularly recurrent neural networksTraining artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution)Clustering, using genetic algorithms to optimize a wide range of different fit-functions.Multidimensional systemsMultimodal OptimizationMultiple criteria production schedulingMultiple population topologies and interchange methodologiesMutation testingParallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs.Rare event analysisSolving the machine-component grouping problem required for cellular manufacturing systemsStochastic optimizationTactical asset allocation and international equity strategiesWireless sensor/ad-hoc networks.

Page 15: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

15

Simple GA Example@ https://github.com/Percona-QA/gaai/blob/master/ga_example/ga_example.lua

git clone https://github.com/Percona-QA/gaai.git cd ga_examplelua ga_example.lua

Polation: 100, Genes: 10, Generations: 100

This GA simply takes sum(rand(0,9999999/10),rand(idem),...rand(n))i.e. a random number between 0 and 9999999 divided by the number of genes * the number of genes (max fitness=9999999)

Page 16: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

16

A bit of (MySQL) database tuning history

● Past: very poor defaults/templates, settings tuning a must

● Current: more optimized/increased defaults, settings tuning may still be recommended for high-use production systems

● Future: automatically adjusting settings (GA or logic based)

Past: MANUAL > Future: AUTOMATED

Automated systems are less error prone and can be optimized over time!

Page 17: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

17

GA Database Tuning: a new concept / mindset● It does not really matter which workload GA optimizes

○ i.e. there is no “right”, “wrong”, “common” or “specific” one○ GA will be able to optimize any of them

● This is dissimilar to past performance benchmarking, which is usually tuned towards/optimized for a specific load (or set of loads)

● It matters less here how much effective % is gained using a specific set of options for a specific semi-synthetic workload

● It matters much more here how much overall improvement is seen over time as the workload changes (real production workloads)

Page 18: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

18

Thoughts on Database Tuning #1● R/O variable optimization require restart: not suitable for production

systems

● Sysbench load is uniform/synthetic (easier to optimize), though I expect that actual user loads will achieve similar (i.e. 80%) similar ROI’s, unless the data being processed is highly random

● Tuning various memory buffers can be complex and requires surrounding “safety” code calculations (or value ranges) to avoid OOM

● Things may change over time, for ex. the number of client connections

Page 19: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

19

Thoughts on Database Tuning #2

● It would be good to cover for special events like checkpoints(Longer sample runtimes may be sufficient to cover this)

● Not all mysqld variables automatically lend themselves to “pure performance tuning” as some variables are features - setting them changes the performance, but only because the server functionality was modified also - i.e. the performance offset may be expected (credit: Laurynas Biveinis)

● Some vars require longer runtime to sample (e.g. InnoDB buffer pool)

Page 20: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

20

Thoughts on Database Tuning #3● Optimizations are system dependent!

○ For example, using tmpfs vs ssd vs slow hdd’s, fast vs slow I/O controllors, number of cpu threads, OS configuration etc. - the speed will vary differently for different systems

○ This is one of the great strengths of using GA for database tuning: ■ Optimizes per-load, i.e. load-specific var adjustments/tuning■ Optimizes per-system, i.e. hardware/OS optimized var tuning■ Optimizes per-moment in time, i.e. changes in any area over time■ Optimizes across all factors combined

○ For humans, this is possible only in a (very) limited fashion, and requires a good understanding of each optimization plane.

Page 21: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

21

Thoughts on Database Tuning #4

● A human may miss non-obvious areas of optimization○ For example, if many buffers were automatically made smaller

then there would be more room for other workload-specific performance-affecting buffers.

● Performance drops (usually light & short) may be seen○ A possible fix gradual/staged/stepped changes

■ Example; stepped changes, i.e. +100/-100 instead of random Note; the actual change would still be random (e.g. from -100 to +100 with step 1)

■ This may also help with variables that need a larger sample duration window - needs further evaluation

Page 22: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

22

Thoughts on Database Tuning #5

● Optimization towards other fitness values is possible○ For example, with what set of options do I see the least amount of

client rejects (locking etc.), network disconnects, etc.○ These can take a smaller/secondary importance value in the fitness

(though “combining discongruous fitness values” is a complex topic)○ Another example is taking errors as a guide for what value areas to

avoid for given parameters, though human smarts is better (OOM etc.)

● Each variable will be more or less optimizable by GA. For example,https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_thread_concurrency would seem highly optimizable

Page 23: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

23

gaai

● An advanced proof-of-concept/small framework which could easily be expanded to become a full-fledged GA performance optimizer, or could easily be adapted to use more complex/different GA algo’s etc.

● Code is GPL v2 licensed, GA code is MIT licensed

● Not connected in any way with Ottertune. They’ve done some interesting work also https://db.cs.cmu.edu/papers/2017/p1009-van-aken.pdf

● As a POC, starts with a very poorly optimized server and tunes 13 InnoDB parameters automatically to improve performance

Page 24: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

24

gaai continued

● There are likely many other variables which can be tuned by GA

● The POC is (made) hyper-fast in applying changes, but for actual production machines the pace of change can be;

1) slower2) further controlled with sanity checks etc. (avoids major drops)

● Further GA algo optimization is possible○ Limit or eliminate the number of re-evals○ Use a faster/more advanced GA algorithm

Page 25: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

25

POC: Start with poorly optimized server

● MYSQLD_PRECONFIG="--innodb-buffer-pool-size=5242880 --table-open-cache=1 --innodb-io-capacity=100 --innodb-io-capacity-max=100000 --innodb-thread-concurrency=1 --innodb-concurrency-tickets=1 --innodb-flush-neighbors=2 --innodb-log-write-ahead-size=512 --innodb-lru-scan-depth=100 --innodb-random-read-ahead=1 --innodb-read-ahead-threshold=0 --innodb-commit-concurrency=1 --innodb-change-buffer-max-size=0 --innodb-change-buffering=none"

Page 26: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

26

POC: Genes: Tune 13 InnoDB Variables

= Approx 8.134713296270707e+36 possible combinations

Page 27: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

27

Sysbench Prepare/Run● Prepare

sysbench /usr/share/sysbench/oltp_insert.lua --mysql-storage-engine=innodb --table-size=${TABLESIZE} --tables=${NROFTABLES} --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=${BASEDIR}/socket.sock prepareTABLESIZE=1000000, NROFTABLES=4

● Runsysbench /usr/share/sysbench/oltp_read_write.lua --report-interval=${1} --time=0 --events=0 --index_updates=10 --non_index_updates=10 --distinct_ranges=15 --order_ranges=15 --threads=${2} --table-size=${3} --tables=${4} --percentile=95 --verbosity=3 --mysql-db=test --mysql-user=root --db-driver=mysql --mysql-socket=${BASEDIR}/socket.sock run$1=1 (1 SEC SAMPLING), $2=5 (5 THREADS), $3=1000000, $4=4

Page 28: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

28(X: QPS, Y: TIME, 5 Threads)

Page 29: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

29(X: QPS, Y: TIME, 5 Threads)

Page 30: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

30

The future

● Query indexing GA optimization (with thanks @Peter Zaitsev)

● Overall GA, per-option GA, or rule based “smarts” can all be explored

● Variables which require larger sample duration windows; stepped?

● Tuned options expansion, more surrounding logic, solid ranges

● Far future; learning accross systems, more advanced AI

● More immediate; real workload testing

Page 31: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

31

Real Workload GA Optimization

If you are interested in trying GA optimized performance for your

production load, we are happy to work with you!

Page 32: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

32

Contacts

● Percona○ https://www.percona.com/about-percona/contact ○ https://twitter.com/Percona ○ https://www.linkedin.com/company/percona

● Roel Van de Paar○ [email protected]○ https://twitter.com/RoelVandePaar ○ https://au.linkedin.com/in/roelvandepaar

Page 33: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

33

Page 34: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

34

Join the Percona Product Managers for Lunch!

● With Tyler Duzan, Michael Coburn, and Alexander Rubin

● Share your feedback

● Get to see the product roadmaps

Wednesday @ the reserved area in back of Gaia Restaurant

Page 35: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

35

Thank You Sponsors!!

Page 36: Database Performance Tuning Artificial Intelligence · 2/17/2016  · Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly

36

Rate My Session: Example