csc 261/461 –database systems lecture 19 › courses › 261 › fall2017 › lectures ›...
TRANSCRIPT
![Page 1: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/1.jpg)
CSC 261/461 – Database SystemsLecture 19
Fall 2017
CSC261,Fall2017,UR
![Page 2: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/2.jpg)
Announcements
• CIRC:– CIRC is down!!!–MongoDB and Spark (mini) projects are at stake. L
• Project 1 Milestone 4– is out– Due date: Last date of class• We will check your website after that date• But, finish early
• Due Dates:– Suggestions:
CSC261,Fall2017,UR
![Page 3: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/3.jpg)
Due Dates
• 11/12 to 11/18
• 11/19 to 11/25 (Thanksgiving Week)
• 11/26 to 12/02
• 12/03 to 12/09: – Term Paper Due: 12/08
• 12/10 to 12/13 (Last Class): – Poster Session on: 12/11 – Project 1 Milestone 4 is due on 12/13
• Final: December 18, 2017 at 7:15 pm
CSC261,Fall2017,UR
MongoDB
Spark
TermPaper
PosterSession
![Page 4: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/4.jpg)
Topics for Today
• Query Processing (Chapter 18) • Query Optimization (Chapter 19) on Wednesday
CSC261,Fall2017,UR
![Page 5: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/5.jpg)
QUERY PROCESSING
CSC261,Fall2017,UR
![Page 6: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/6.jpg)
Steps in Query Processing
• Scanning
• Parsing
• Validation
• Query Tree Creation
• Query Optimization (Query planning)
• Code generation (to execute the plan)
• Running the query code
CSC261,Fall2017,UR
![Page 7: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/7.jpg)
Steps in Query Processing
CSC261,Fall2017,UR
![Page 8: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/8.jpg)
SQL Queries
• SQL Queries are decomposed into Query blocks:– Select…From…Where…Group By…Having
• Translate Query blocks into Relational Algebraic expression
• Remember, SQL includes aggregate operators:–MIN, MAX, SUM, COUNT etc.– Part of the extended algebra– Let’s go back to Chapter 8 (Section 8.4.2)
CSC261,Fall2017,UR
![Page 9: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/9.jpg)
Aggregate Functions and Grouping (Relational Algebra)
• Aggregate function: ℑ
•< 𝑔𝑟𝑜𝑢𝑝𝑖𝑛𝑔𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 >
ℑ< 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑙𝑖𝑠𝑡 >
(R)
CSC261,Fall2017,UR
Dno ℑ COUNT Ssn, AVERAGE Salary(EMPLOYEE).
![Page 10: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/10.jpg)
Semijoin (⋉)
• R ⋉ S = P A1,…,An (R ⋈ S)• Where A1, …, An are the
attributes in R• Example:– Employee ⋉Dependents
SELECT DISTINCTsid,sname,gpa
FROM Students,People
WHEREsname = pname;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋉ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
SELECT DISTINCTsid,sname,gpa
FROM Students
WHEREsname IN
(SELECT pname FROM People);
OR
CSC261,Fall2017,UR
![Page 11: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/11.jpg)
EXTERNAL SORTING
CSC261,Fall2017,UR
![Page 12: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/12.jpg)
External Merge Sort
![Page 13: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/13.jpg)
Why are Sort Algorithms Important?
• Data requested from DB in sorted order is extremely common– e.g., find students in increasing GPA order
• Why not just use quicksort in main memory??–What about if we need to sort 1TB of data with 1GB of
RAM…
Aclassicproblemincomputerscience!
![Page 14: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/14.jpg)
So how do we sort big files?
1. Split into chunks small enough to sort in memory (“runs”)
2. Merge pairs (or groups) of runs using the external merge algorithm
3. Keep merging the resulting runs (each time = a “pass”) until left with one sorted file!
![Page 15: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/15.jpg)
2. EXTERNAL MERGE & SORT
![Page 16: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/16.jpg)
Challenge: Merging Big Files with Small Memory
How do we efficiently merge two sorted files when both are much larger than our main memory buffer?
![Page 17: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/17.jpg)
External Merge Algorithm
• Input: 2 sorted lists of length M and N
• Output: 1 sorted list of length M + N
• Required: At least 3 Buffer Pages
• IOs: 2(M+N)
![Page 18: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/18.jpg)
Key (Simple) Idea
To find an element that is no larger than all elements in two lists, one only needs to compare minimum elements from each list.
If:𝐴: ≤ 𝐴< ≤ ⋯ ≤ 𝐴>𝐵: ≤ 𝐵< ≤ ⋯ ≤ 𝐵@
Then:𝑀𝑖𝑛(𝐴:, 𝐵:) ≤ 𝐴E𝑀𝑖𝑛(𝐴:, 𝐵:) ≤ 𝐵F
fori=1….Nandj=1….M
![Page 19: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/19.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Input:Twosortedfiles
Output:Onemergedsortedfile
Disk
MainMemory
Buffer1,5
2,22
F1
F2
![Page 20: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/20.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
1,5 2,22Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
![Page 21: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/21.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
5 22 1,2Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
![Page 22: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/22.jpg)
External Merge Algorithm
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
5 22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
![Page 23: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/23.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
Thisisallthealgorithm“sees”…Whichfiletoloadapagefromnext?
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
7,11
![Page 24: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/24.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
WeknowthatF2 onlycontainsvalues≥ 22…soweshouldloadfromF1!
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
7,11
![Page 25: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/25.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F27,11
![Page 26: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/26.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
MainMemory
Buffer
5,722
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
![Page 27: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/27.jpg)
External Merge Algorithm
20,31
23,24 25,30
Disk
MainMemory
Buffer
5,7
22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
![Page 28: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/28.jpg)
External Merge Algorithm
23,24 25,30
Disk
MainMemory
Buffer
5,7
22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
20,31
Andsoon…
![Page 29: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/29.jpg)
We can merge lists of arbitrary length with only 3 buffer pages.
IflistsofsizeMandN,thenCost: 2(M+N)IOs
Eachpageisreadonce,writtenonce
![Page 30: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/30.jpg)
External Merge Sort Algorithm
27,24 3,1
Example:• 3Buffer
pages• 6-pagefile
Disk MainMemory
Buffer
18,22
F1
F2
33,12 55,3144,10
1. Split into chunks small enough to sort in memory
Orangefile=unsorted
![Page 31: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/31.jpg)
EXTERNAL MERGE SORT (BEFORE MERGE)
CSC261,Spring2017,UR
![Page 32: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/32.jpg)
External Merge Sort Algorithm
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F2
33,12 55,3144,10
1. Split into chunks small enough to sort in memory
Example:• 3Buffer
pages• 6-pagefile
Orangefile=unsorted
![Page 33: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/33.jpg)
External Merge Sort Algorithm
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F233,12 55,3144,10
1. Split into chunks small enough to sort in memory
Example:• 3Buffer
pages• 6-pagefile
Orangefile=unsorted
![Page 34: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/34.jpg)
External Merge Sort Algorithm
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F231,33 44,5510,12
Example:• 3Buffer
pages• 6-pagefile
1. Split into chunks small enough to sort in memory
Orangefile=unsorted
![Page 35: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/35.jpg)
External Merge Sort Algorithm
Disk MainMemory
BufferF1
F2
31,33 44,5510,12
AndsimilarlyforF2
27,24 3,118,2218,22 24,271,3
1. Splitintochunkssmallenoughtosortinmemory
Example:• 3Buffer
pages• 6-pagefileEachsortedfileisacalledarun
![Page 36: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/36.jpg)
External Merge Sort Algorithm
Disk MainMemory
BufferF1
F2
2. Now just run the external merge algorithm & we’re done!
31,33 44,5510,12
18,22 24,271,3
Example:• 3Buffer
pages• 6-pagefile
![Page 37: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/37.jpg)
Calculating IO Cost
For 3 buffer pages, 6 page file:
1. Split into two 3-page files and sort in memory = 1 R + 1 W for each file = 2*(3 + 3) = 12 IO operations
2. Merge each pair of sorted chunks using the external merge algorithm = 2*(3 + 3) = 12 IO operations
3. Total cost = 24 IO
![Page 38: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/38.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
18,43 24,2745,38
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
![Page 39: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/39.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
18,43 24,2745,38
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
1.Splitintofilessmallenoughtosortinbuffer…
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
![Page 40: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/40.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
1.Splitintofilessmallenoughtosortinbuffer…andsort
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Calleachofthesesortedfilesarun
![Page 41: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/41.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
![Page 42: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/42.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
3.Andrepeat…
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Calleachofthesestepsapass
![Page 43: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/43.jpg)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
4.Andrepeat!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Disk
3,10 10,101,3
12,16 18,1812,12
20,22 22,2318,18
24,24 27,2723,24
31,31 31,3327,31
33,38 39,4033,33
43,44 44,4541,42
48,55 55,5546,47
![Page 44: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/44.jpg)
Simplified 3-page Buffer Version
Assume for simplicity that we split an N-page file into N single-page runs and sort these; then:
• First pass: Merge N/2 pairs of runs each of length 1 page
• Second pass: Merge N/4 pairs of runs each of length 2 pages
• In general, for N pages, we do 𝒍𝒐𝒈𝟐 𝑵 passes– +1 for the initial split & sort
• Each pass involves reading in & writing out all the pages = 2N IO
Unsortedinputfile
Split&sort
Merge
Merge
Sorted!
à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!
![Page 45: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/45.jpg)
Using B+1 buffer pages to reduce # of passes
Suppose we have B+1 buffer pages now; we can:
1. Increase length of initial runs. Sort B+1 at a time!At the beginning, we can split the N pages into runs of length B+1 and sort these in memory
2𝑁( log< 𝑁 + 1)
IOCost:
Startingwithrunsoflength1
2𝑁( log<𝑵
𝑩 + 𝟏 + 1)
StartingwithrunsoflengthB+1
![Page 46: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/46.jpg)
Using B+1 buffer pages to reduce # of passes
Suppose we have B+1 buffer pages now; we can:
2. Perform a B-way merge. On each pass, we can merge groups of B runs at a time (vs. merging pairs of runs)!
IOCost:
2𝑁( log< 𝑁 + 1) 2𝑁( log<𝑵
𝑩 + 𝟏 + 1)
Startingwithrunsoflength1
StartingwithrunsoflengthB+1
2𝑁( logV𝑵
𝑩 + 𝟏 + 1)
PerformingB-waymerges
![Page 47: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/47.jpg)
Algorithm fro Select Operation
• Read Section 18.3 (18.3.1 , 18.3.2, 18.3.3, 18.3.4)• Mostly covers searching:
• 1. Linear Search• 2. Binary Search• 3. Indexing• 4. Hashing• 5. B+ Tree
• (Skip bitmap index and functional index)
CSC261,Fall2017,UR
![Page 48: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/48.jpg)
Algorithm for Join Operation
• The most time consuming operation
CSC261,Fall2017,UR
![Page 49: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/49.jpg)
What you will learn about in this section
1. NestedLoopJoin(NLJ)
2. BlockNestedLoopJoin(BNLJ)
3. IndexNestedLoopJoin(INLJ)
4. Sorted-MergeJoin
5. HashJoin
CSC261,Fall2017,UR
![Page 50: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/50.jpg)
RECAP: Joins
CSC261,Fall2017,UR
![Page 51: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/51.jpg)
Joins: Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
𝐑 ⋈ 𝑺 SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
CSC261,Fall2017,UR
![Page 52: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/52.jpg)
Joins: Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
𝐑 ⋈ 𝑺 SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
CSC261,Fall2017,UR
![Page 53: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/53.jpg)
Joins: Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
𝐑 ⋈ 𝑺 SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
CSC261,Fall2017,UR
![Page 54: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/54.jpg)
Joins: Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
𝐑 ⋈ 𝑺 SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
CSC261,Fall2017,UR
![Page 55: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/55.jpg)
Joins: Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
3 1 1 7
𝐑 ⋈ 𝑺 SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
CSC261,Fall2017,UR
![Page 56: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/56.jpg)
Semantically: A Subset of the Cross Product
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R S A B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
3 1 1 7
×CrossProduct
Filterbyconditions(r.A =s.A)
… Canweactuallyimplementajoininthisway?
𝐑 ⋈ 𝑺
CSC261,Fall2017,UR
![Page 57: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/57.jpg)
Notes
• We write 𝐑 ⋈ 𝑺 to mean join R and S by returning all tuple pairs where all shared attributes are equal
• We write 𝐑 ⋈ 𝑺 on A to mean join R and S by returning all tuple pairs where attribute(s) A are equal
• For simplicity, we’ll consider joins on two tables and with equality constraints (“equijoins”)
Howeverjoinscanmerge>2tables,andsomealgorithmsdosupportnon-equalityconstraints!
CSC261,Fall2017,UR
![Page 58: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/58.jpg)
Nested Loop Joins
CSC261,Fall2017,UR
![Page 59: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/59.jpg)
Notes
• We are again considering “IO aware” algorithms: care about disk IO
• Given a relation R, let:– T(R) = # of tuples in R– P(R) = # of pages in R
• Note also that we omit ceilings in calculations… good exercise to put back in!
Recallthatweread/writeentirepageswithdiskIO
CSC261,Fall2017,UR
![Page 60: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/60.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
CSC261,Fall2017,UR
![Page 61: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/61.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
P(R)
1. LoopoverthetuplesinR
NotethatourIOcostisbasedonthenumberofpages loaded,notthenumberoftuples!
Cost:
CSC261,Fall2017,UR
![Page 62: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/62.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
P(R)+T(R)*P(S)
HavetoreadallofSfromdiskforeverytupleinR!
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
Cost:
CSC261,Fall2017,UR
![Page 63: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/63.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
P(R)+T(R)*P(S)
NotethatNLJcanhandlethingsotherthanequalityconstraints…justcheckintheifstatement!
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
3. Checkagainstjoinconditions
Cost:
CSC261,Fall2017,UR
![Page 64: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/64.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
P(R)+T(R)*P(S)+OUT
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
3. Checkagainstjoinconditions
4. Writeout(topage,thenwhenpagefull,todisk)
Cost:
CSC261,Fall2017,UR
![Page 65: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/65.jpg)
Nested Loop Join (NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:
for s in S:if r[A] == s[A]:
yield (r,s)
P(R)+T(R)*P(S)+OUT
WhatifR(“outer”)andS(“inner”)switched?
Cost:
P(S)+T(S)*P(R)+OUT
Outervs.innerselectionmakesahugedifference-DBMSneedstoknowwhichrelationissmaller!
CSC261,Fall2017,UR
![Page 66: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/66.jpg)
Block Nested Loop Join (BNLJ)
CSC261,Fall2017,UR
![Page 67: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/67.jpg)
Block Nested Loop Join (BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each page pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
P(𝑅)
Given3pagesofmemory
1. Loadin1pageofRatatime(leaving1pageeachfreeforS&output)
Cost:
Note:Therecouldbesomespeeduphereduetothefactthatwe’rereadinginmultiplepagessequentiallyhoweverwe’llignorethishere!
CSC261,Fall2017,UR
![Page 68: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/68.jpg)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each page pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Block Nested Loop Join (BNLJ)
P 𝑅 + 𝑃 𝑅 . 𝑃(𝑆)
Given3pagesofmemory
Note:Fastertoiterateoverthesmaller relationfirst!
1. Loadin1pageofRatatime(leaving1pageeachfreeforS&output)
2. ForeachpagesegmentofR,loadeachpageofS
Cost:
CSC261,Fall2017,UR
![Page 69: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/69.jpg)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each page pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Block Nested Loop Join (BNLJ)
Given3 pagesofmemory
1. Loadin1pageofRatatime(leaving1pageeachfreeforS&output)
2. ForeachpagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
BNLJcanalsohandlenon-equalityconstraints
Cost:
CSC261,Fall2017,UR
P 𝑅 + 𝑃 𝑅 . 𝑃(𝑆)
![Page 70: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/70.jpg)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each page pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Block Nested Loop Join (BNLJ)
Given3pagesofmemory
1. Load1pageofRatatime(leaving1pageeachfreeforS&output)
2. ForeachpagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
4. Writeout
Cost:
CSC261,Fall2017,UR
P 𝑅 + 𝑃 𝑅 . 𝑃(𝑆)
![Page 71: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/71.jpg)
Block Nested Loop Join (BNLJ) (B+1 pages of Memory)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
P(𝑅)
GivenB+1 pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
Cost:
Note:Therecouldbesomespeeduphereduetothefactthatwe’rereadinginmultiplepagessequentiallyhoweverwe’llignorethishere!
CSC261,Fall2017,UR
![Page 72: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/72.jpg)
Block Nested Loop Join (BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:
if r[A] == s[A]:yield (r,s)
P 𝑅 +𝑃 𝑅𝐵 − 1𝑃(𝑆)
GivenB+1pagesofmemory
Note:Fastertoiterateoverthesmaller relationfirst!
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
Cost:
CSC261,Fall2017,UR
![Page 73: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/73.jpg)
Block Nested Loop Join (BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:
if r[A] == s[A]:yield (r,s)
GivenB+1pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
BNLJcanalsohandlenon-equalityconstraints
Cost:
P 𝑅 +𝑃 𝑅𝐵 − 1𝑃(𝑆)
CSC261,Fall2017,UR
![Page 74: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/74.jpg)
Block Nested Loop Join (BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:
if r[A] == s[A]:yield (r,s)
P 𝑅 +b cVd:
𝑃(𝑆) +OUT
GivenB+1pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
4. Writeout
Cost:
CSC261,Fall2017,UR
![Page 75: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/75.jpg)
BNLJ vs. NLJ: Benefits of IO Aware
• In BNLJ, by loading larger chunks of R, we minimize the number of full disk reads of S–We only read all of S from disk for every (B-1)-page segment of R!– Still the full cross-product, but more done only in memory
P 𝑅 +b cVd:
𝑃(𝑆) +OUTP(R)+T(R)*P(S)+OUTNLJ BNLJ
BNLJisfasterbyroughly(Vd:)e(c)b(c)
CSC261,Fall2017,UR
![Page 76: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/76.jpg)
BNLJ vs. NLJ: Benefits of IO Aware
• Example:– R: 500 pages– S: 1000 pages– 100 tuples / page– We have 12 pages of memory (B = 11)
• NLJ: Cost = 500 + 50,000*1000 = 50 Million IOs ~= 140 hours
• BNLJ: Cost = 500 + fgg∗:ggg:g
= 50 Thousand IOs ~= 0.14 hours
Averyrealdifferencefromasmallchangeinthealgorithm!
IgnoringOUThere…
CSC261,Fall2017,UR
![Page 77: CSC 261/461 –Database Systems Lecture 19 › courses › 261 › fall2017 › lectures › l19.pdf · MongoDB Spark Term Paper Poster Session. Topics for Today •Query Processing](https://reader030.vdocuments.mx/reader030/viewer/2022040122/5f04ab6e7e708231d40f1e89/html5/thumbnails/77.jpg)
Acknowledgement
• Some of the slides in this presentation are taken from the slides provided by the authors.
• Many of these slides are taken from cs145 course offered byStanford University.
CSC261,Fall2017,UR