copyright © 2014, oracle and/or its affiliates. all rights reserved. |copyright © 2014, oracle...

34
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Manyi Lu Senior Engineering Manager MySQL Optimizer Team, Oracle October 1, 2014 MySQL 5.7: What’s New in the Parser and the Optimizer?

Upload: hilary-cook

Post on 21-Dec-2015

247 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |Copyright © 2014, Oracle and/or its affiliates. All rights reserved.

Manyi LuSenior Engineering ManagerMySQL Optimizer Team, OracleOctober 1, 2014

MySQL 5.7: What’s New in the Parser and the Optimizer?

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Optimizer

SELECT a, bFROM t1, t2, t3WHERE t1.a = t2.b AND t2.b = t3.c AND t2.d > 20 AND t2.d < 30;

MySQL Server

Cost based optimizations

Heuristics

Cost Model

Opti

mize

r

Table/index info(data dictionary)

Statistics(storage engines)

t2 t3

t1

Tablescan

Rangescan

Ref access

JOIN

JOIN

Pars

er

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Optimizer: Design Principles

• Best out of the box performance• Easy to use, minimum tuning needed• When you need to understand: explain and trace• Flexibility through optimizer switches, hints and plugins• Fast evolving

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Parser and Optimizer Improvements

• Parser and optimizer refactoring

• Improved cost model: better record estimation for JOIN

• Improved cost model: configurable cost constants

• Query rewrite plugin

• Explain on a running query

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Optimizer Improvements

• Computed columns

• UNION ALL queries no longer use temporary tables

• Improved optimizations for queries with IN expressions

• Optimized full text search

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

5.7 Parser and Optimizer Refactoring

Optimizer

Logical transformations

Cost-based optimizer:Join order and access methods

Plan refinement

Query execution plan

Query execution

Parser

Resolver:Semantic check,name resolution

SQL DML query

Query result

Storage EngineInnoDB MyISAM

Improves readability, maintainability and stability

– Cleanly separate the parsing,

optimizing, and execution stages

– Allows for easier feature additions,

with lessened risk

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Parser Refactoring

• Challenge:– Overly complex, hard to add new syntax

• Solution:– Create an internal parse tree bottom-up– Create an AST (Abstract Syntax Tree) from the parse

tree and the user's context. – Have syntax rules that are more precisely defined

and are closer to the SQL standard. – More precise error messages– Better support for larger syntax rules in the future

Resolver

Optimizer

SE

Lexical Scanner (lexer)

GNU Bison-generated Parser(bottom-up parsing style)

Contextualization

Parser (new)

Executor

AST

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Motivation for Changing the Cost Model• Adopt to new hardware architectures

– SSD, larger memories, caches

• Allows storage engines to provide accurate and dynamic cost estimate– Whether the data is in RAM, SSD, HDD?

• More maintainable cost model implementation– Avoid hard coded constants– Refactoring of existing cost model code

• Tunable/configurable• Replace heuristics with cost based decisions

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Cost Model: Main Focus in 5.7

Address the following pain points in current cost model:• Hard-coded cost constants

– Not possible to adjust for different hardware

• Imprecise cardinality/records per key estimates from SE– Integer value gives too low precision

• Inaccurate record estimation for JOIN– Too high fan out

• Hard to obtain detailed cost numbers

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.6: Record Estimates for JOIN

• t1 JOIN t2

• Total cost = cost (access method t1) + Prefix_rows_t1 * cost (access method t2)

• Prefix_rows_t1 is records read by t1

– Overestimation if where conditions apply!->Suboptimial join order

Without condition filtering

t1 t2Acce

ssM

etho

d

Prefix_rows_t1Number of records read

from t1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimates for JOIN

• t1 JOIN t2

• Prefix_rows_t1 Takes into account the entire query condition– More accurate record estimate -> improved JOIN order

Condition filter

t1 t2Acce

ssM

etho

d

Number of records read

from t1

Cond

ition

filte

r

Prefix_rows_t1Records passing the table

conditions on t1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• 10 000 rows in the emp table• 100 rows in the office table• 100 rows with first_name=”John” AND hire_date BETWEEN “2012-01-

01″ AND “2012-06-01″

MySQL 5.7 Improved Record Estimates for JOIN

CREATE TABLE emp ( id INTEGER NOT NULL PRIMARY KEY, office_id INTEGER NOT NULL, first_name VARCHAR(20), hire_date DATE NOT NULL, KEY office (office_id) ) ENGINE=InnoDB;

CREATE TABLE office ( id INTEGER NOT NULL PRIMARY KEY, officename VARCHAR(20) ) ENGINE=InnoDB;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Table Type Possible keys Key Ref Rows Filtered Extraoffice ALL PRIMARY NULL NULL 100 100.00 NULL

employee ref office office office.id 99 100.00 Using where

MySQK 5.7 Improved Record Estimates for JOIN

Explain for 5.6: Total Cost = cost(scan office) + 100 * cost(ref_access emp)

Explain for 5.7: Total Cost = cost(scan emp) + 9991*1.23% * cost(eq_ref_access office)

SELECT office_nameFROM office JOIN employee ON office.id = employee.officeWHERE employee.name LIKE “John” AND hire_date BETWEEN “2014-01-01” AND “2014-06-01”;

Table Type Possible keys Key Ref Rows Filtered Extraemployee ALL NULL NULL NULL 9991 1.23 NULL

office eq_ref PRIMARY PRIMARY employee.office 1 100.00 Using where JOIN ORDERHAS CHANGED!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimation for JOINPerformance Improvements: DBT-3 (SF 10)

Q3 Q7 Q8 Q9 Q120

20

40

60

80

100

120

5.65.7

5 out of 22 queries get an improved query plan

Exec

ution

Tim

e Re

lativ

e to

5.6

in

Perc

enta

ge

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Additional Cost Data in JSON ExplainJSONs

mysql> EXPLAIN FORMAT=JSON SELECT SUM(o_totalprice) FROM orders WHERE o_orderdate BETWEEN '1994-01-01' AND '1994-12-31'; { "query_block": { "select_id": 1, "cost_info": { "query_cost": "3118848.00" }, "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": [ "i_o_orderdate" ], "rows_examined_per_scan": 15000000, "rows_produced_per_join": 4489990, "filtered": 29.933, "cost_info": { "read_cost": "2220850.00", "eval_cost": "897998.00", "prefix_cost": "3118848.00", "data_read_per_join": "582M" }, "used_columns": [ "o_totalprice", "o_orderDATE" ], "attached_condition": "(`dbt3`.`orders`.`o_orderDATE` between '1994-01-01' and '1994-12-31')" } } }

Total query cost of a query block Cost per table Cost of sorting operation Cost of reading data Cost of evaluating conditions Cost of prefix join Rows examined/produced per join Used columns Data read per join – (# of rows)*(record width) in byte

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Visual Explain in MySQL Workbench

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Why Query Rewrite Plugin?• Problem

– Optimizer choses a suboptimal plan– Users can change the query plan by adding hints or rewrite the

query– However, dabase application code cannot be changed

• Solution: query rewrite plugin!

labs.mysql.com

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Query Rewrite Plugin• New pre and post parse query rewrite APIs

– Users can write their own plug-ins

• Provides a post-parse query plugin– Rewrite problematic queries without the need to make application changes– Add hints– Modify join order– Many more …

• Improve problematic queries from ORMs, third party apps, etc• ~Zero performance overhead for queries not to be rewritten

labs.mysql.com

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 How Rewrites Happen?

For querySELECT * FROM t1 JOIN t2 ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2Replace parameter markers in Replacement with actual literals:

Pattern is:

SELECT *FROM t1 JOIN t2 ON t1.keycol = t2.keycolWHERE col1 = ? AND col2 = ?

Replacement is:

SELECT a, b, cFROM t1 STRAIGHT_JOIN t2 FORCE INDEX (col1)ON t1.keycol = t2.keycolWHERE col1 = ? AND col2 = ?

SELECT a, b, c FROM t1 STRAIGHT_JOIN t2 FORCE INDEX (col1) ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 How Matching of Rules Happen?

Match and execute rule in three steps:1. Hash lookup using query digest computed during parsing

– Finds patterns with same digest.

2. Parse tree structure comparison– To filter out hash collision– Will not detect differences in literals

3. Compare literal constants – In practice done during rewrite

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Communicating with the Plugin query_rewrite.rewrite_rules table:

Pattern Pattern_database Replacement Enabl

ed Message

SELECT name, department_name FROM employee JOIN department USING ( department_id ) WHERE salary > ?

employees

SELECT name, department_name FROM employee STRAIGHT JOIN department USING ( department_id ) WHERE salary > ?

Y NULL

SELECT name, department_name FROXM employee JOIN department USING ( department_id ) WHERE salary > ?

employees

SELECT name, department_name FROM employee STRAIGHT JOIN department USING ( department_id ) WHERE salary > ?

NParse error in pattern:……near ……at line 1

SELECT name, department_name FROM employee JOIN department USING ( department_id ) WHERE salary > ?

textSELECT name, department_name FROXM employee STRAIGHT JOIN department USING ( department_id ) WHERE salary > ?

NParse error in replacement …near … at line 1

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Query Rewrite Plug-in: Server’s POV• Query comes in

– Plugin(s) is asked if it wants digests (It does) • Query is parsed• Plugin is invoked• The plug-in may (in case of refresh of rules):

– Scan the rules table using the Rules Table Service. For each row:• Pattern + replacement parsed via the parser service• Pattern is traversed using the parser service• Parser service asked for string offsets of '?' in replacement• Parser service asked for normalized query text of pattern• performance_schema asked for digest

• The query is rewritten. Server raises SQL note.

1. Hash lookup using digest. High false positive rate2. Internal tree structure comparison. Misses literal constants3. Compare literal constants. In practice done during rewrite.

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Query Rewrite Plugin: Performance Impact

What is the Cost of Rewriting queries?• Designed for rewriting problematic queries only!• ~ Zero cost for queries not to be rewritten

– Statement digest computed for performance schema anyway

• Cost of queries to be rewritten is insignificant compared to performance gain– Cost of generating query + reparsing max ~5% performance overhead– Performance gain potentially x times

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Explain on a Running Query

EXPLAIN [FORMAT=(JSON|TRADITIONAL)] [EXTENDED] FORCONNECTION <id>;

• Shows query plan on connection <id>• Useful for diagnostic on long running queries• Plan isn’t available when query plan is under creation• Applicable to SELECT/INSERT/DELETE/UPDATE

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Generated Columns

• Column generated from the expression• VIRTUAL: computed when read, not stored, not indexable• STORED: computed when inserted/updated, stored in SE, indexable• Useful for:

– Functional index: create a stored column, add a secondary index– Materialized cache for complex conditions – Simplify query expression

labs.mysql.com

CREATE TABLE order_lines (order integer, lineno integer, price decimal(10,2), qty integer, sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) STORED );

Kodus to Andrey Zhakov for his contribution!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Avoid Creating Temporary Table for UNION ALL

SELECT * FROM table_a UNION ALL SELECT * FROM table_b;

• 5.6: Always materialize results of UNION ALL in temporary tables• 5.7: Do not materialize in temporary tables unless used for sorting,

rows are sent directly to client• 5.7: Client will receive the first row faster, no need to wait until the

last query block is finished• 5.7: Less memory and disk consumption

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimizations for IN Expressions

• 5.6: Certain queries with IN predicates can’t use index scans or range scans even though all the columns in the query are indexed.

• 5.6: Range optimizer ignores lists of rows

• 5.6: Needs to rewrite to De-normalized form SELECT a, b FROM t1 WHERE ( a = 0 AND b = 0 ) OR ( a = 1 AND b = 1 )

• 5.7: IN queries with row value expressions executed using range scans.

• 5.7: Explain output: Index/table scans changes to range scans

CREATE TABLE t1 (a INT, b INT, c INT, KEY x(a, b));

SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimizations for IN Expressions

• A table has 10 000 rows, 2 match the where condition

Before:**************1. row *****************

select_type: SIMPLE

table: t1

type: index

key: x

key_len: 10

ref: NULL

rows: 10 000

Extra: Using where; Using index

After:*************1. row *****************

select_type: SIMPLE

table: t1

type: range

key: x

key_len: 10

ref: NULL

rows: 2

Extra: Using where; Using index

SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimization for Full Text Search

SELECT COUNT(*) FROM innodb_table WHERE MATCH(text) AGAINST ('for the this that‘ in natural language mode) > 0.5;

• Recognize more situations where ‘index only’ access method can be use. No need to access base table, only FT index – when the MATCH expression was part of a '>' expression

• 2.5 GB data– 4X performance improvement!

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimization for Full Text Search

Before:**************1. row *****************

select_type: SIMPLE

table: innodb_table

type: fulltext

key: ft_idx

key_len: 0

ref: NULL

rows: 1

Extra: Using where;

After:*************1. row *****************

select_type: SIMPLE

table: innodb_table

type: fulltext

key: ft_idx

key_len: 10

ref: const

rows: 1

Extra: Using where; Ft_hints: rank > 0.500000; Using index

SELECT COUNT(*) FROM innodb_table WHERE MATCH(text) AGAINST ('for the this that‘ in natural language mode) > 0.5;

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Optimize Full Text Search

SELECT COUNT(*) FROM test.wp WHERE MATCH(text) AGAINST ('+for +the +this+that' in boolean mode) ;

SELECT COUNT(*) FROM innodb_table WHERE MATCH(text) AGAINST ('for the this that‘ in natural language mode) ;

• Optimize performance of COUNT(*) • Optimizer provides hints to InnoDB• When InnoDB takes advantages of the hints:

– Do not calculate ranking – Avoid sorting

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

What is on Our Roadmap?

• Improve prepared statement performance

• Continue redesign cost model, add histogram

• Continue optimizer refactoring

• Support functional index

• Store and query JSON documents

• Support parallel queries

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |