data structures haim kaplan and uri zwick november 2012 lecture 5 b-trees
TRANSCRIPT
B-Trees (with minimum degree d)
Each node holds between d−1 and 2d −1 keys
Each non-leaf node has between d and 2d children
The root is special:has between 1 and 2d −1 keys
and between 2 and 2d children (if not a leaf)
All leaves are at the same depth
Node structure
r – the degree
key[0],…key[r−2] – the keys
k0 kr-3 kr-2k1 k2…
c0 c1 c2 cr−2 cr−1
child[0],…child[r−1] – the children
leaf – is the node a leaf?
Possibly a different representation for leafs
item[0],…item[r−2] – the associated items
The height of B-Trees
• At depth 1 we have at least 2 nodes• At depth 2 we have at least 2d nodes• At depth 3 we have at least 2d2 nodes
• …
• At depth h we have at least 2dh−1 nodes
1 g2( lo1) hd
hn d d d h n
Number of nodes accessed - logdn Number of operations – O(d logdn)
Number of ops with binary search – O(log2d logdn) = O(log2n)
Look for k in node x
Look for k in the subtree of node x
B-Trees vs binary search trees
• Wider and shallower• Access less nodes during search• But may take more operations
The hardware structure
CPU
RAMDisk
Cache
Each memory-level much larger but much slower
Information moved in blocks
A simplified I/O model
CPU
RAMDisk
Each block is of size m.
Count both operations and I/O operations
Data structures in the I/O model
Linked list and search trees behave poorly in the I/O model.
Each pointer followed may cause a disk access
Pick d such that a node fits in a block B-trees reduce the worst case # of I/Os
Each node (struct) is allocated continuously.Harder to control the disk blocks containing
different nodes
Number of nodes accessed - logdn Number of operations – O(d logdn)
Number of ops with binary search – O(log2d logdn) = O(log2n)
Look for k in node x
Look for k in the subtree of node x
I/Os
Red-Black Trees vs. B-Trees
n = 230 109
30 ≤ height of Red-Black Tree ≤ 60
Up to 60 pages read from disk
Height of B-Tree with d=1000 is only 3
Each B-Tree node resides in a block/page
Only 3 (or 4) pages read from disk
Disk access 1 millisecond (10-3 sec)
Memory access 100 nanosecond (10-7 sec)
B-Trees – What are they good for?
• Large degree B-trees are used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block.
• Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.
• B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.
Insert – Bottom up
Find the insertion point by a downward searchInsert the key in the appropriate place
If the current node is overflowing, split itIf its parent is now overflowing, split it, etc.
Disadvantages:Need both a downward scan and an upward scan
Need to keep parents on a stackNodes are temporarily overflowing
Insert – Top down
While conducting the search,split full children on the search path
before descending to them!
When the appropriate leaf it reached,it is not full, so the new key may be added!
Insert – Top downWhile conducting the search,
split full children on the search pathbefore descending to them!
Number of I/Os – O(logdn)Number of operations – O(d logdn)
Delete (Replace with predecessor)
22 28
20 30 40 5014
delete(T,13)
1 2 4 6 8 9 11 12
10 12
7 15
3
24
Delete – Top down
While conducting the search,make sure that each child descended into
contains at least d keys
How?Steal or join
Assume, at first, that the item to be deleted is in a leaf
When the item is located, it resides in a leaf containing at least d keys, so it can be removed
Delete – Top downWhile conducting the search,
make sure that each child you descend to contains at least d keys
d−1 dRotate! (Steal)
d−1 d−1
Join!
Delete – Top down
What if the item to be deleted is in an internal node?
Descend as before from the root untilthe item to be deleted is located
Keep a pointer to the node containing the item
Carry on descending towards the successor, making sure that nodes contain at least d keys
When the successor is found, delete it from its leafand use it to replace the item to be deleted
Deletions from B-Trees
As always, similar, but slightly more complicated than insertions
(may need to replace with successor)
Deletion is slightly simpler for B+-Trees
B-Trees vs. B+-Trees
• In a B-tree each node contains items and keys• In a B+-tree leaves contain items and keys.
Internal nodes contain keys to direct the search. • Keys in internal nodes are either keys of existing
items, or keys of items that were deleted.
• Internal nodes may contain more keys so overall the # of items we can store increases