fractal prefetching b + -trees: optimizing both cache and disk performance author: shimin chen,...
TRANSCRIPT
![Page 1: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/1.jpg)
Fractal Prefetching B+-Trees:
Optimizing Both Cache and Disk Performance
Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin
Members: Iris Zhang, Grace Yung, Kara Kwon, Jessica Wong
![Page 2: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/2.jpg)
Outline
1. Introduction2. Optimizing I/O Performance
a. Searchesb. Range Scans
3. Optimizing Cache Performancea. Disk-First fpB+-Treesb. Cache-First fpB+-Trees
4. Conclusion
![Page 3: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/3.jpg)
Introduction
• Traditional B+-Trees– Optimized for I/O performance – tree nodes = disk pages
• Recent new types of B+-Trees– Optimized for CPU cache performance– tree nodes sizes = one or few cache lines– Introduce concept of prefetching
![Page 4: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/4.jpg)
Introduction (cont’d)
Figure 1: Traditional B+-Trees
Page Control InfoIndex entry
(key and page/tuple ID)
![Page 5: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/5.jpg)
Introduction (cont’d)
• Problem (due to large discrepancy in optimal node sizes)
1. Disk-optimized B+-Trees suffer from poor cache performance
2. Cache-optimized B+-Trees suffer from poor disk performance
![Page 6: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/6.jpg)
Introduction (cont’d)
• Proposal: Fractal Prefetching B+-Trees (fpB+-Trees)
1. Embed “cache-optimized” trees within “disk-optimized” trees
2. Optimize both cache and I/O performance
3. Two approaches:-> disk-first
-> cache-first
![Page 7: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/7.jpg)
Introduction (cont’d)
Figure 2: Self-similar “tree within a tree” structure
![Page 8: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/8.jpg)
Introduction (cont’d)
• Disk-first and Cache-first
• What is done to optimize performance
• How to process operations efficiently– Bulkload– Search– Insertion– Deletion
![Page 9: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/9.jpg)
Optimizing I/O Performance
• fpB+-Trees combine features of disk- and cache-optimized B+-Trees to achieve best of both structures
• Consider two concepts from pB+-Trees– Searches: Prefetching and node sizes– Range Scans: Prefetching via jump-pointer
arrays
![Page 10: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/10.jpg)
Optimizing I/O Performance (cont’d)
• Prefetching:– Modern db servers are composed of multiple
disks per processor– Goal: effectively exploit I/O parallelism
• Explicitly prefetching disk pages even when the access patterns are not sequential
![Page 11: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/11.jpg)
Searches: Prefetching and Node Sizes (cont’d)
• For disk-resident data– Increase the B+-Tree node size to be a multiple of the
disk page size– Prefetch all pages of a node when accessing it
• Pages are placed on different disks so that requests can be serviced in parallel
• Result: faster search
![Page 12: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/12.jpg)
Searches: Prefetchingand Node Sizes (cont’d)
• Problem– I/O latency improves for a single search, but
may become worse when there are extra seeks for a node
– Additional seeks may degrade performance
• Conclusion: target node-size for fpB+-Tree will be a single disk page
![Page 13: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/13.jpg)
Range Scans: Prefetchingvia Jump-Pointer Arrays
• Range scan– searching for the starting key of the range,
then reading consecutive leaf nodes in the tree
• Jump-pointer array helps leaves to be effectively prefetched
• One implementation: add sibling pointers to each node that is a parent of leaves
![Page 14: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/14.jpg)
Range Scans: Prefetchingvia Jump-Pointer Arrays (cont’d)
Figure 3: Internal jump-pointer array
TreeLeafParent
![Page 15: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/15.jpg)
Range Scans: Prefetchingvia Jump-Pointer Arrays (cont’d)
• This technique can be applied to fpB+-Tree
• Enhancement to avoid overshooting:– fpB+-Trees begin by searching for both start
and end key in order to remember the range end page
– This technique does not decrease throughput
![Page 16: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/16.jpg)
Optimizing Cache Performance
• The search operation of B+-Trees suffers poor cache performance
– During a search, each page on the path to a key is visited
– In each page, binary search is performed on the large continuous array
– Costly in terms of cache misses
![Page 17: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/17.jpg)
Optimizing CachePerformance (cont’d)
• Example:– Key, page ID and tuple ID are all 4 bytes– With a 8KB page, can hold over 1000 entries– Cache line is 64 bytes => hold 8 entries– Suppose page has 1023 entries (1 to 1023)– Locate a matching entry 71, requires 10
probes with binary search• 512, 256, 128, 64, 96, 80, 72, 68, 70, 71
![Page 18: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/18.jpg)
Optimizing CachePerformance (cont’d)
• The update operation of B+-Trees is costly– Insertion and deletion both begin with search– To insert an entry in a sorted array, on average
half of the page must be copied to make room for the new entry
![Page 19: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/19.jpg)
Disk-First fpB+-Trees
• Start with disk-optimized B+-Trees
• Organize keys and pointers in each page-sized node into a cache-optimized tree
• In each node - small cache-optimized tree: in-page tree
– Modeled after pB+-Trees, which is shown to have best cache performance
![Page 20: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/20.jpg)
Disk-First fpB+-Trees (cont’d)
Figure 4: Disk-optimized fpB+-Trees : a cache-optimized tree inside each page
page control info
![Page 21: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/21.jpg)
Disk-First fpB+-Trees (cont’d)
• In-page tree has nodes aligned on cache line boundaries
• Each node is several cache lines wide– When a node is visited as part of a search, all
cache lines in the node are prefetched
• Increases fan-out of the node and reduce height of the in-page tree
• Result: better overall performance
![Page 22: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/22.jpg)
Disk-First fpB+-Trees (cont’d)
• Non-leaf nodes– Contains pointers to other in-page nodes
within the same page– To further pack more entries into each node,
use short in-page offsets instead of full pointers
• Leaf nodes– Contains pointers to nodes external to their
in-page tree
![Page 23: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/23.jpg)
Disk-First fpB+-Trees (cont’d)
• Optimal in-page node size is determined by memory system parameters and key and pointer sizes
• Optimal page size is determined by I/O parameters and disk and memory prices
• With a mismatch between the two sizes, tree may have overflow or underflow
![Page 24: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/24.jpg)
Disk-First fpB+-Trees (cont’d)page control info
page control info
Unused Space
Figure 5: Overflow and Underflow
![Page 25: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/25.jpg)
Disk-First fpB+-Trees (cont’d)page control info
page control info
Figure 6: Fitting cache-optimized trees in a page
- use smaller nodes when overflow
- use larger nodes when underflow
![Page 26: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/26.jpg)
Disk-First fpB+-Trees: Operations
• Bulkload: operations at two granularities– At a page granularity: follow common B+-
Tree bulkload algorithm– For in-page trees of non-leaf pages, pack
entries into one in-page leaf node after another
– For in-page trees of leaf pages, try to distribute entries across all in-page leaf nodes
• Maintain a linked list of all in-page leaf nodes
![Page 27: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/27.jpg)
Disk-First fpB+-Trees: Operations (cont’d)
• Search– Straightforward search done for each
granularity
![Page 28: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/28.jpg)
Disk-First fpB+-Trees: Operations (cont’d)
• Insertion: operations at two granularities– If there are empty slots in the in-page leaf
node, insert the entry into the sorted array for the node
![Page 29: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/29.jpg)
Disk-First fpB+-Trees: Operations (cont’d)
• Insertion: operations at two granularities– Otherwise, split the leaf node into two
a. Allocate new nodes in the same page
b. Reorganize in-page tree if number of entries is fewer than page maximum fan-out
c. Split the page by copying half of the in-page leaf nodes to a new page, and rebuild the two in-page trees in their respective pages
![Page 30: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/30.jpg)
Disk-First fpB+-Trees: Operations (cont’d)
• Deletion– A search for the entry– Follow by a lazy deletion of entry in a leaf
node– Do not merge leaf nodes that become half
empty
![Page 31: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/31.jpg)
Cache-First fpB+-Trees
• Start with cache-optimized B+-Trees
• Ignore page boundaries
• Then try to intelligently place cache-optimized nodes into disk pages
![Page 32: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/32.jpg)
Cache-First fpB+-Trees (cont’d)
• Non-leaf node– Contains an array of keys and pointers– A pointer is a combination of a page ID and
an offset in the page• Use the page ID to retrieve a disk page
• Visit a node in the page by the offset
• Leaf node– Contains an array of keys and tuple ids
![Page 33: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/33.jpg)
Cache-First fpB+-Trees:Node Placement
• Goal 1: group sibling leaf nodes together into the same page to reduce disk operations during range scans
• Approach: designate certain pages as leaf pages that contain only leaf nodes
– Leaf nodes in the same page are siblings
![Page 34: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/34.jpg)
Cache-First fpB+-Trees:Node Placement (cont’d)
• Goal 2: group a parent node and its children together into the same page to ensure searches only need one disk operation for a parent and its child
• Problems:– Not possible for all nodes– Node size mismatch (overflow and
underflow)
![Page 35: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/35.jpg)
Cache-First fpB+-Trees:Node Placement (cont’d)
• For underflow (i.e. “not enough” children)– Place grandchildren, great grandchildren, etc
onto the same page
• For overflow: two approachesa. Place overflowed child into its own page as
top-level node with its own children
b. Store overflowed child in special overflow pages
![Page 36: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/36.jpg)
Cache-First fpB+-Trees:Node Placement (cont’d)
Figure 8: Cache-first fpB+-Tree design
Nonleaf nodes Aggressive Placement
Overflow pages forleaf node parents
![Page 37: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/37.jpg)
Cache-First fpB+-Trees: Operations
• Bulkload: Leaf nodes – Placed consecutively in leaf pages, and linked
together with sibling links
![Page 38: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/38.jpg)
Cache-First fpB+-Trees: Operations
• Bulkload: Non-leaf nodes– Determine whether there is space for the node
to fit into the same page as its parent– If not, then
• Allocate the node as the top level node in a new page, or
• If the non-leaf node is a parent of a leaf node, place it into the overflow page
![Page 39: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/39.jpg)
Cache-First fpB+-Trees: Operations (cont’d)
• Search– Straightforward with one thing to note– When proceeding from a parent to one of its
children, compare the page ID– Same page ID indicates parent and child are
on the same page• Can directly access the node in the page without
retrieving the page from buffer manager
![Page 40: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/40.jpg)
Cache-First fpB+-Trees: Operations (cont’d)
• Insertion:– If there are empty slots in the leaf node,
simply insert the entry; else need to split node into two
– If leaf page has space, accommodate the new node; else need to split the leaf page
• Move second half of the leaf nodes to a new page
• Update corresponding child pointers in their parents
![Page 41: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/41.jpg)
Cache-First fpB+-Trees: Operations (cont’d)
• Insertion:– After leaf node split, need to insert an entry
into the parent node– If parent node is full, it needs to be split
• For leaf parent node, the new node may be allocated from overflow pages
• If further splits up the tree are needed, the new node must be allocated as described in bulkload
![Page 42: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/42.jpg)
Cache-First fpB+-Trees: Operations (cont’d)
• Deletion– Similar to disk-first fpB+-Trees
![Page 43: Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56649ef95503460f94c0b32c/html5/thumbnails/43.jpg)
Conclusion
1. Problems of traditional B+-Trees2. In optimizing I/O performance,
considered two concepts from pB+-Trees: searches and range scans
3. How disk-first and cache-first fpB+-Trees performances better traditional B+-Trees
4. Operations (bulkload, search, insertion, deletion)