evaluation of expression in query processing

30
EVALUATION OF EXPRESSION MADE BY: NEEL SHAH(130110107048) Department Of Computer Engineering DATABASE MANGMENT SYSTEM (2130703) G.H Patel College of Engineering and Technology

Upload: neel-shah

Post on 01-Jul-2015

59 views

Category:

Engineering


2 download

DESCRIPTION

This Presentation is on the topic of Evaluation of Expression under Query Processing In the area of Database Management System of Computer Engineering.

TRANSCRIPT

Page 1: Evaluation of Expression in Query Processing

EVALUATION OF EXPRESSION

MADE BY:

NEEL SHAH(130110107048)

Department Of Computer Engineering

DATABASE MANGMENT SYSTEM (2130703)

G.H Patel College of Engineering and Technology

Page 2: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 2

QUERY EVALUATIONS PLANS

• A QUERY EVALUATION PLAN CONSISTS OF AN EXTENDED RELATIONAL ALGEBRA TREE, WITH ADDITIONAL ANNOTATIONS AT EACH NODE INDICATING THE IMPLEMENTATION METHOD TO USE FOR EACH RELATIONAL OPERATOR.

Page 3: Evaluation of Expression in Query Processing

ename (on the fly)

planeId=100 AND rating>5 (on the fly)

(file scan) Employees Maintenances (file scan)

(nested loops)

Method to use

Sometimes it might be possible, to pipeline the result of one operator to another operator without creating a temporary tablefor the intermediate result. This saves in cost.When the input to a unary operator (e.g. or ) is pipelined into it, we say the operator is applied on-the-fly.

G.H Patel College of Engg and Technology, Department Of Computer Engineering 3

Page 4: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 4

ALTERNATIVE QUERY EVALUATION PLANS

• LET’S LOOK AT TWO (NAÏVE PLANS)

Page 5: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 5

ename (on the fly)

planeId=100 AND rating>5 (on the fly)

Maintenances(file scan)

Employees(file scan)

(nested loops join)

Cost for this plan: 300,000 I/Os for the join. and are done in the fly; no I/O cost for them.

Page 6: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 6

ename (on the fly)

planeId=100 AND rating>5 (on the fly)

Maintenances(file scan)

Employees(file scan)

(sort merge join)

Cost for this plan: 7,500 I/Os for the join. and are done in the fly; no I/O cost for them.

Page 7: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 7

EVALUATION OF EXPRESSION

• THERE ARE TWO APPROACHES HOW A QUERY EXECUTION TREE CAN BE EVALUATED

• MATERIALIZATION

• COMPUTE THE RESULT OF AN EVALUATION PRIMITIVE AND MATERIALIZE (STORE) THE NEW RELATION ON THE DISK

• PIPELINING

• PASS ON TUPLES TO PARENT OPERATIONS EVEN WHILE AN OPERATION IS STILL BEING EXECUTED

Page 8: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 8

MATERIALIZATION

• EVALUATE ONE OPERATION AFTER ANOTHER STARTING AT THE LEAVE NODES OF THE QUERY EXPRESSION TREE

• MATERIALIZE INTERMEDIATE RESULTS IN TEMPORARY RELATIONS AND USE THOSE FOR EVALUATING OPERATIONS AT THE NEXT LEVEL

Page 9: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 9

• A MATERIALIZED EVALUATION IS ALWAYS POSSIBLE

• COSTS OF READING AND WRITING TEMPORARY RELATIONS CAN BE QUITE HIGH

• DOUBLE BUFFERING WITH TWO OUTPUT BUFFERS FOR EACH OPERATION

Page 10: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 10

FOR EXAMPLE:

SELECT * FROM STAFF WHERE ID = (SELECT MAX(MANAGER) FROM ORG)

IN THIS STATEMENT, THE SUBQUERY NEEDS TO BE EVALUATED ONLY ONCE.

THIS TYPE OF SUBQUERY MUST RETURN ONLY ONE ROW. IF EVALUATING THE SUBQUERY CAUSES A CARDINALITY VIOLATION (IF IT RETURNS MORE THAN ONE ROW), AN EXCEPTION IS THROWN WHEN THE SUBQUERY IS RUN.

Page 11: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 11

SUBQUERY MATERIALIZATION IS DETECTED BEFORE OPTIMIZATION, WHICH ALLOWS THE DERBY OPTIMIZER TO SEE A MATERIALIZED SUBQUERY AS AN UNKNOWN CONSTANT VALUE. THE COMPARISON IS THEREFORE OPTIMIZABLE.

THE ORIGINAL STATEMENT IS TRANSFORMED INTO THE FOLLOWING TWO STATEMENTS:CONSTANT = SELECT MAX(MANAGER) FROM ORG SELECT * FROM STAFF WHERE ID = CONSTANT

THE SECOND STATEMENT IS OPTIMIZABLE.

Page 12: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 12

PIPELINING

• PIPELINING EVALUATES MULTIPLE OPERATIONS SIMULTANEOUSLY BY PASSING RESULTS OF ONE OPERATION TO THE NEXT ONE WITHOUT STORING THE TUPLES ON THE DISK

• MUCH CHEAPER THAN MATERIALIZATION SINCE NO I/O OPERATIONS FOR TEMPORARY RELATIONS

• PIPELINING IS NOT ALWAYS POSSIBLE

• DOES NOT WORK FOR INPUT FOR SORTING ALGORITHMS

Page 13: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 13

Page 14: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 14

• PIPELINES CAN BE EXECUTED IN A DEMAND DRIVEN OR IN A PRODUCER DRIVEN MANNER

• DEMAND DRIVEN OR LAZY PIPELINING (PULL PIPELINING)

• TOP LEVEL OPERATION REPEATEDLY REQUESTS THE NEXT TUPLE FROM ITS CHILDREN

• PRODUCER DRIVEN OR EAGER PIPELINING (PUSH PIPELINING)

• THE CHILD OPERATORS PRODUCE TUPLES EAGERLY AND PASS THEM TO THEIR PARENTS VIA A BUFFER

• IF THE BUFFER IS FULL, THE CHILD OPERATOR HAS TO WAIT UNTIL THE PARENT OPERATOR CONSUMED SOME TUPLES

Page 15: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 15

• THE USE OF PIPELINING MAY HAVE AN IMPACT ON THE TYPES OF ALGORITHMS THAT CAN BE USED FOR A SPECIFIC OPERATION

• E.G. JOIN WITH A PIPELINED LEFT-HAND-SIDE INPUT

• THE LEFT RELATION IS NEVER AVAILABLE ALL AT ONCE FOR PROCESSING

• I.E. MERGE JOIN CANNOT BE USED IF THE INPUTS ARE NOT SORTED

• HOWEVER, WE CAN FOR EXAMPLE USE AN INDEXED NESTED-LOOP JOIN

Page 16: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 16

AN SQL QUERY AND ITS RA EQUIV.

Employees (sin INT, ename VARCHAR(20), rating INT, age REAL)

Maintenances (sin INT, planeId INT, day DATE, descCode CHAR(10))

SELECT ename

FROM Employees NATURAL JOIN Maintenances

WHERE planeId = 100 AND rating > 5;

ename (planeId=100 AND rating>5 (Employees Maintenances))

Page 17: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 17

ename

planeId=100 AND rating>5

Maintenances Employees

RA expressions can are

represented by an expression tree.

An algorithm is chosen for each

node in the expression tree.

Page 18: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 18

QUERY OPTIMIZATION

• SQL QUERIES ARE TRANSLATED INTO EXTENDED RELATIONAL ALGEBRA.

• QUERY EVALUATION PLANS ARE REPRESENTED AS TREES OF RELATIONAL OPERATORS, WITH LABELS IDENTIFYING THE ALGORITHM TO USE AT EACH NODE.

• THESE EXPRESSION TREES CAN BE TRANSFORMED TO "BETTER" TREES.

• ALGORITHMS FOR INDIVIDUAL OPERATORS CAN BE COMBINED IN MANY WAYS TO EVALUATE A QUERY.

• INDEXES ARE VERY IMPORTANT.

• THE PROCESS OF FINDING A GOOD EVALUATION PLAN IS CALLED QUERY OPTIMIZATION.

Page 19: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 19

CLUSTERING/NON-CLUSTERING INDEXES

• CLUSTERING INDEX: TUPLES (OF THE RELATION) WITH SAME SEARCH KEY ARE STORED TOGETHER AS CONTROLLED BY THE INDEX.

• SAME AS "PRIMARY"

• NON-CLUSTERING INDEX: TUPLES (OF THE RELATION) WITH SAME SEARCH KEY ARE STORED RANDOMLY, NOT CONTROLLED BY THE INDEX.

• SAME AS "SECONDARY"

Page 20: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 20

RUNNING EXAMPLE – AIRLINE

EMPLOYEES (SIN INT, ENAME VARCHAR(20), RATING INT, AGE REAL)

MAINTENANCES (SIN INT, PLANEID INT, DAY DATE, DESCCODE CHAR(10))

• ASSUME THAT

• EACH TUPLE OF MAINTENANCES IS 40 BYTES LONG

• A BLOCK CAN HOLD 100 MAINTENANCES TUPLES (4K BLOCK)

• WE HAVE 1000 BLOCKS OF SUCH TUPLES.

• ASSUME THAT

• EACH TUPLE OF EMPLOYEES IS 50 BYTES LONG,

• A BLOCK CAN HOLD 80 EMPLOYEES TUPLES

• WE HAVE 500 BLOCKS OF SUCH TUPLES.

Page 21: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 21

ALGORITHMS FOR SELECTION

R.ATTR = VALUE (R)

• IF NO INDEX ON R.ATTR, THEN JUST SCAN R.

• HOW MANY DISK ACCESSES IF R IS THE MAINTENANCES RELATION?

• ON AVERAGE 1000/2 BLOCK (PAGES) READS.

• IF THERE IS AN INDEX WE HAVE TO TYPICALLY DO 3 DISK ACCESSES

• THIS IS, ASSUMING A NON-CLUSTERING B-TREE WITH 3 LEVELS, WITH THE ROOT IN MAIN MEMORY.

R.ATTR < VALUE (R)

• EVEN WHEN THERE IS A NON-CLUSTERING INDEX WE MIGHT BETTER SCAN THE RELATION IGNORING THE INDEX. WHY?

• OF COURSE, IF WE HAVE A CLUSTERING INDEX, WE USE IT.

Page 22: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 22

ALGORITHMS FOR PROJECTIONS

• GIVEN A PROJECTION WE HAVE TO SCAN THE RELATION AND DROP CERTAIN FIELDS OF EACH TUPLE.

• THAT’S EASY.

• HOWEVER, IF WE NEED TO DO A SET PROJECTION (AS OPPOSED TO BAG PROJECTION) SPECIFIED WITH THE DISTINCT KEYWORD IN SQL, WE NEED TO REMOVE DUPLICATES.

• THIS IS MORE EXPENSIVE.

• USUALLY DONE BY SORTING, IN ORDER TO CO-LOCATE THE DUPLICATES AND THEN REMOVE THEM.

• CAN BE COMBINED WITH THE FINAL PASS OF SORTING.

Page 23: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 23

ALGORITHMS FOR JOINS

• JOINS ARE EXPENSIVE OPERATIONS AND VERY COMMON.

• CONSIDER THE NATURAL JOIN OF MAINTENANCES AND EMPLOYEES.

INDEX NESTED LOOPS JOIN

• SUPPOSE EMPLOYEES HAS AN INDEX (B-TREE) ON THE SIN COLUMN.

• WE CAN SCAN MAINTENANCES AND, FOR EACH TUPLE, USE THE INDEX TO PROBE EMPLOYEES FOR MATCHING TUPLES.

• ANALYSIS:

• TAKES ABOUT 3 I/OS ON AVERAGE TO RETRIEVE THE APPROPRIATE LEAF OF THE INDEX.

• FOR EACH OF THE 100,000 MAINTENANCE RECORDS WE TRY TO ACCESS THE CORRESPONDING EMPLOYEE WITH 3 I/OS.

• SO, 100,000*3 = 300,000 I/OS ON AVERAGE!!

Page 24: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 24

ALGORITHMS FOR JOINS (SORT-MERGE)

SORT-MERGE

• SORT BOTH TABLES ON THE JOIN COLUMN, AND THEN SCAN THEM TO FIND MATCHES.

• ANALYSIS:

• SORT MAINTENANCES IN TWO PASSES, AND EMPLOYEES IN TWO PASSES

• COST FOR SORT IS

• 2 * 2 * 1000 = 4000 I/OS FOR MAINTENANCES AND

• 2 * 2 * 500 = 2000 I/OS. FOR EMPLOYEES

• THEN WE MERGE. THIS REQUIRES AN ADDITIONAL SCAN OF BOTH TABLES.

• THUS THE TOTAL COST IS 4000+ 2000+ 1000+ 500= 7500 I/OS. (MUCH BETTER!!)

Page 25: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 25

ALGORITHMS FOR JOINS (SORT-MERGE)

SO, WE HAVE:

• INDEX NESTED LOOPS JOIN: 300,000 I/OS

• SORT-MERGE JOIN : 7,500 I/OS.

• WHY BOTHER WITH INDEX NESTED LOOPS JOIN?

• WELL, “INDEX NESTED LOOPS” METHOD HAS THE NICE PROPERTY THAT IT IS INCREMENTAL.

• THE COST OF OUR EXAMPLE JOIN IS INCREMENTAL IN THE NUMBER OF MAINTENANCES TUPLES THAT WE PROCESS.

• THEREFORE, IF SOME ADDITIONAL SELECTION IN THE QUERY ALLOWS US TO CONSIDER ONLY A SMALL SUBSET OF MAINTENANCES TUPLES, WE CAN AVOID COMPUTING THE FULL JOIN OF MAINTENANCES AND EMPLOYEES.

Page 26: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 26

• SUPPOSE WE ONLY WANT THE RESULT OF THE JOIN FOR THE PLANE 100, AND THERE ARE VERY FEW SUCH MAINTENANCES.

• FOR EACH SUCH MAINTENANCES TUPLE, WE PROBE EMPLOYEES, AND WE ARE DONE.

• SORT-MERGE JOIN, ON THE OTHER HAND, WILL SCAN THE ENTIRE MAINTENANCES TABLE AT LEAST ONCE,

• THE COST OF THIS STEP ALONE IS LIKELY TO BE MUCH HIGHER THAN THE ENTIRE COST OF INDEX NESTED LOOPS JOIN.

Page 27: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 27

QUERY OPTIMIZATION

• OBSERVE THAT THE CHOICE OF INDEX NESTED LOOPS JOIN IS BASED ON CONSIDERING THE QUERY AS A WHOLE, INCLUDING THE EXTRA SELECTION ON MAINTENANCES, RATHER THAN JUST THE JOIN OPERATION BY ITSELF.

• THIS LEADS US TO THE NEXT TOPIC, QUERY OPTIMIZATION, WHICH IS THE PROCESS OF FINDING A GOOD PLAN FOR AN ENTIRE QUERY.

• QUERY OPTIMIZATION IS ONE OF THE MOST IMPORTANT TASKS OF A RELATIONAL DBMS. THE OPTIMIZER GENERATES ALTERNATIVE PLANS AND CHOOSES THE PLAN WITH THE LEAST ESTIMATED COST.

Page 28: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 28

REFERENCES

• DATABASE SYSTEM CONCEPTS, ABRAHAM SILBERSCHATZ, HENRY F. KORTH & S. SUDARSHAN, MCGRAW HILL.

• AN INTRODUCTION TO DATABASE SYSTEMS, C J DATE, PEARSON

Page 29: Evaluation of Expression in Query Processing

G.H Patel College of Engg and Technology, Department Of Computer Engineering 29

QUESTIONS??

Page 30: Evaluation of Expression in Query Processing