1 optimization recap and examples. 2 optimization introduction for every sql expression, there are...
Post on 20-Dec-2015
219 views
TRANSCRIPT
![Page 1: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/1.jpg)
1
Optimization Recap and Optimization Recap and examplesexamples
![Page 2: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/2.jpg)
2
Optimization introductionOptimization introduction
• For every SQL expression, there are
many possible ways of implementation.
• The different alternatives could result
in huge run-time differences.
• Our aim is to introduce the basic
hardware used, and optimization
principles
![Page 3: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/3.jpg)
3
Disk-Memory-CPUDisk-Memory-CPU
Delete from Sailors where
sid=90
DISK
sailors
Reserves
Main Memory
CPU
![Page 4: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/4.jpg)
4
Hardware RecapHardware Recap• The DB is kept on the Disk.
• The Disk is divided into BLOCKS
• Any processing of the information occurs in the Main Memory.
• Therefore, a block which we want to access has to be brought from the Disk to the memory, and perhaps written back.
• Blocks are read/written from/to the Disk as single units.
• The time of reading/writing a block to/from the disk is an I/O operation, and takes a lot of time.
![Page 5: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/5.jpg)
5
Hardware RecapHardware Recap
• We assume a constant time for each Disk access,
and that only disk access define the run time. • We do not consider writing to the disk
• Every table in the DB is stored as a File (on the Disk), which is a ‘bunch of Blocks’.
• We will deal with files that are ‘heap-sorted’, i.e., there is no order in the file tuples
• Every block contains many tuples, each of them has a Record ID (RID), which states its location:
(number of block, number of tuple within the block)
![Page 6: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/6.jpg)
6
SIDSNAM
E
ratin
g
ag
e
192
3
Joe832
332
1
Phil941
133
2
Boe733
122
6
Bill623
144
4
Paul121
111
2
Jim333
144
5
Vicky954
RID
Block 1
Block 2
Block 3
(1,1)
(1,2)
(1,3)
(2,1)
(2,2)
(2,3)
(3,1)
![Page 7: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/7.jpg)
7
SIDSNAM
E
ratin
g
ag
e
192
3
Joe832
332
1
Phil941
133
2
Boe733
122
6
Bill623
144
4
Paul121
111
2
Jim333
144
5
Vicky954
B blocks
t tuples
Q: What would be the cost of the following queries?
Select * from sailors
Select * from sailors where sname=‘Jim’
Select * from sailors where rating>4
Answer: B
![Page 8: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/8.jpg)
8
Indexes on filesIndexes on files
• An Index on a table is an additional file
which helps access the data fast.
• An index holds ‘data entries’ to the table file
• The index can have the structure of a B+
Tree, or a hash function.
![Page 9: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/9.jpg)
9
Tree index Tree index on sname of on sname of
sailorssailors
‘A’->’M’ B1
‘N’->’Z’ B2
‘N’->’T’ L3
‘U’->’Z’ L4
‘A’->’G’ L1
‘H’->’M’ L2
Root block
Leaf blocks
Branch blocks
‘Bill’(2,1)
‘Boe’(1,3)
‘Vicky’ (3,1)
…
‘Paul’ (2,2)
‘Phil’ (1,2)
‘Jim’(2,3)
‘Joe’(1,1)
B1
L4L3L2L1
B2
SIDSNAM
E
ratin
gage
1923Joe832
3321Phil941
1332Boe733
1226Bill623
1444Paul121
1112Jim333
1445Vicky954
![Page 10: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/10.jpg)
10
Tree indexTree index
• The tree is kept balanced
• The tree entries are always ordered
• The leaves point to the exact location of
tuples
• Getting to the leaf is typically 2-3 I/O
• Each leaf points to the next/previous leaf
• A Clustered index means that the index and
the table are ordered by the same attribute
![Page 11: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/11.jpg)
11
Tree index Tree index on sname of on sname of
sailorssailors
‘Bill’(2,1)
‘Boe’(1,3)
‘Phil’ (1,2)
…
‘Joe’ (2,2)
‘Joe’ (3,1)
‘Jim’(2,3)
‘Joe’(1,1)
B1
L4L3L2L1
B2
SIDSNAM
E
ratin
gage
1923Joe832
3321Phil941
1332Boe733
1226Bill623
1444Joe121
1112Jim333
1445Joe954
How would the following queries be processed?
Select * from sailors where sname=‘Joe’
Select * from sailors
Select * from sailors where sname>’J’
Notice: index Notice: index is not is not
clusteredclustered
![Page 12: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/12.jpg)
12
Tree index Tree index on sname of on sname of
sailorssailors
‘Bill’(1,1)
‘Boe’(1,2)
‘Phil’ (3,1)
…
‘Joe’ (2,2)
‘Joe’ (2,3)
‘Jim’(1,1)
‘Joe’(2,1)
B1
L4L3L2L1
B2
SIDSNAM
E
ratin
gage
1226Bill6 23
1332Boe733
1112Jim333
1923Joe832
1444Joe121
1445Joe954
3321Phil941
How would the following queries be processed?
Select * from sailors where sname=‘Joe’
Select * from sailors
Select * from sailors where sname>’J’
Notice: index Notice: index is clusteredis clustered
![Page 13: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/13.jpg)
13
Hash indexHash index
• Works in a similar way, but using a
hash function instead of a tree
• Works only for equality conditions
• Average of 1.2 I/O to get to the tuple
location
![Page 14: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/14.jpg)
14
Natural Join Natural Join
• We want to compute
• Naïve algorithm:
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
Foreach tuple r in RForeach tuple s in S
if r.sid=s.sidadd r,s to result
Cost: BR+tR*BS
Running example data
tR=5000
tS=10,000
50 tuples per block
12 buffer pages
= 100+5000*200=1,000,100
![Page 15: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/15.jpg)
15
Natural Join Natural Join • We want to compute
• We have 4 optional algorithms:
1. Block Nested Loops Join
2. Index Nested Loops Join
3. Sort Merge Join
4. Hash Join
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
This is assuming there is not enough
space in the memory for the smaller of the 2
relations+2
![Page 16: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/16.jpg)
16
Block Nested Loop JoinBlock Nested Loop Join
• Suppose there are B available blocks in the
memory, BR blocks of relation R, and BS
blocks of relations S, and BR<BS
• Until all blocks of R have been read:
– Read B-2 blocks of R
– Read all blocks of S (one by one), and write the
result
• Run time: BR + BS * ceil(BR /(B-2))= 100+200*100/10=2,100
![Page 17: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/17.jpg)
17
Index Nested LoopIndex Nested Loop
• Suppose there is an index on sid of Sailors
• Until all blocks of R have been read:– Read a block of R
– For each tuple in the block, use the index of S to locate the matching tuples in S.
• We mark the time it takes to read the tuples in S that match a single tuple in R as X.
• Run time: BR + tR*X
• If the index is clustered, X=2-4
• If it is not clustered, we evaluate X.= 100+5000*3=15,100
![Page 18: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/18.jpg)
18
• Q: So when would we typically choose
to use an index-nested loop over block-
nested?
• A: Look at the inequality…
![Page 19: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/19.jpg)
19
Sort-Merge JoinSort-Merge Join
• Sort both relations on the join column
• Join them according to the join algorithm:
sidbiddayagent
2810312/4/96Joe
2810311/3/96Frank
3110110/2/96Joe
3110212/7/96Sam
3110113/7/96Sam
581032/6/96Frank
sidsnameratingage
22dustin745
28yuppy935
31lubber855
36lubber636
44guppy535
58rusty1035
![Page 20: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/20.jpg)
20
Run time of Sort-MergeRun time of Sort-Merge
• M,N: number of blocks of the relations
• Sorting: MlogM+NlogN
• Merging: N+M if no partition is scanned twice.
• Total: MlogM+NlogN+N+M
• Especially good if one or both of the relations are already sorted.
= 100*7+200*8+100+200=2,600
![Page 21: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/21.jpg)
21
QuestionQuestion
Suppose:
tuple size= 100 bytes
number of tuples (employees)=3,000
Page size=1000 bytes
You have an unclustered index on Hobby.
You know that 50 employees collect stamps.
Would you use the index?
And for 1,000 stamp-lovers?
SELECT E.dnoFROM Employees EWHERE E.hobby=‘stamps’
![Page 22: 1 Optimization Recap and examples. 2 Optimization introduction For every SQL expression, there are many possible ways of implementation. The different](https://reader030.vdocuments.mx/reader030/viewer/2022032704/56649d455503460f94a21d7a/html5/thumbnails/22.jpg)
22
Question 2Question 2
• Length of tuples, Number of tuples– Emp: 20 bytes, 20,000 tuples
– Dept: 40 bytes, 5000 tuples
• Pages contain 4000 bytes; 12 buffer pages
• Which algorithm would you use if there is an unclustered tree index on E.eid? And clustered?
SELECT E.ename FROM Employees E, Departments DWHERE E.eid=D.eid