b+ tree & b tree extracted from garcia molina adapted by leu to follow elmasri’s definition

Post on 11-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

B+ tree & B tree

Extracted from Garcia Molina

adapted by Leu to follow Elmasri’s Definition

Root

B+Tree Example n=4

35

110

130

179

11

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Sample non-leaf

to keys to keys to keys to keys

57 57 < k 81 81 < k 95 >95

57

81

95

Sample leaf node:

From non-leaf node

to next leaf

in sequence57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

In textbook’s notation n=3

Leaf:

Non-leaf:

30

35

30

30 35

30

Size of nodes: p pointers

p -1 keys (fixed)

Please note that here way or order refer to the maximum number of subtrees

Some definition defines way as the maximum number of keys

Don’t want nodes to be too empty

• Use at least

Non-leaf: p/2 -1 keys (so p/2 tree pointers)

Leaf: p/2 keys & data pointers

Full node min. node

Non-leaf

Leaf

p=4

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to recordsexcept for “sequence pointer”

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n n-1 n/2 n/2- 1

Leaf(non-root) n-1 n-1

Root n n-1 1 1

Max Max Min Min ptrs keys ptrs keys

n/2 n/2

Traditional definition

(3)‘ Number of pointers/keys for B+tree

Non-leaf(non-root) P P-1 P/2 P/2- 1

Leaf(non-root) pleaf pleaf

Root P P-1 1 1

Max Max Min Min ptrs keys ptrsdata keys

(pleaf)/2

Elmasri’s new definition

p- order of the internal node pleaf-order of the leaf node

(pleaf)/2

Tree structure

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root(e) Consider only maximum number of keys

When a node is too full

• Node too full (for m way)

K1,K2,…,K 「 m/2 -1 ,K 「 m/2 ,K 「 m/2 +1 ,…,Km

• Split into two node

K1,K2,…,K 「 m/2 -1K 「 m/2

K 「 m/2 +1 ,…,Km

Replace the original node

replicated into parent node

Right child of new key

(a) Insert key = 7 p=4

3 5 11

30

31

11

31

3 5

7

5

B + tree with Pleaf

• Splitting point is important

• For a leaf node, the splitting point is

j = (pleaf+ 1)/2

• For a non-leaf node, the splitting point is p/2

• refer to page 178-180 of Elmasri’s book

Example, p =3 and Pleaf = 2

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

(b) Coalesce with sibling– Delete 50

10

30

50

10

30

40

50

n=4

40

40

When to coalesce

• When the sibling has just enough keys

sibling has (pleaf)/2 keys , then

the combined node has (pleaf)/2 + (pleaf)/2 -1 keys, which is less than or

equal to 2 * (pleaf)/2 -1 ≦ pleaf + 1 –1 = pleaf

which is not too big!!!

(c) Redistribute keys– Delete 50

10

35

50

10

20

30

35

40

50

n=4

35

30 4

0

40

45

30

37

25

26

20

22

10

141 3

3 14

26

37

(d) Non-leaf coalese– Delete 37

n=4

22

30

22

new root

30

30

Another example

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

example

A PARTS file with Pan# as key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose the search field

values are inserted in the given order in a B+-tree of order p=4 and Pleaf=3;

show how the tree will expand and what the final tree looks like.

solution

Answer:A B+.tree of order p=4 implies that each internal node in the tree (except possibly theroot) should have at least 2 keys (3 pointers) and at most 4 pointers. For Pleaf=3. leafnodes must have at least 2 keys and at most 3 keys. The figure on page 50 shows how thetree progresses as the keys are inserted. We will only show a new tree when insertioncauses a split of one of the leaf nodes. and then show how the split propagates up the tree.Hence, step 1 below shows the tree after Insertion of the first 3 keys 23, 65, and 37,and before Inserting 60 which cause;s overflow and splitting. The trees given below showhow the keys are Inserted In order. Below, we give the keys Inserted for each tree:1:23. 65, 37; 2:60; 3:46; 4:92; 6:48,71; 6:56; 7;59, 18; 8:21; 9:10; 10:74;11:78; 12:15; 13:16; 14:20; 15:24; 16:28.39; 17:43, 47; 18:50, 69: 19:75;20:8, 49, 33. 38;

result

Deletion

Suppose the following search field values are deleted in the given order from the

B+.tree of Exercise 5.11, show how the tree will shrink and show the final tree.

The deleted values are: 65, 75, 43, 18, 20, 92, 59, 37.

Solution

An important note about a delete algorithm for a B+-tree is that deletion a Key value from a leaf node will result in a reorganization of the tree If; (i) The leaf node Is less than half full; in this case, we will combine It with the next leaf node (other algorithms combine it with either the next or the previous leaf nodes, or both), (ii) If the key value

deleted is the rightmost (last) value In the leaf node, In which case its value will appear In an Internal node; In this case, the key value to the left of the deleted key in the left

node replaces the deleted key value in the internal node.

Delete 65, 75

Delete 43

Delete 18

Delete 20, 92, 59

Delete 37

Variation on B+tree: B-tree (no +)

• Idea:– Avoid duplicate keys– Have record pointers in non-leaf nodes

to record to record to record with K1 with K2 with K3

to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3

K1 P1 K2 P2 K3 P3

B-tree example p=3, max. subtrees

65

125

145

165

85

105

25

45

10

20

30

40

110

120

90

100

70

80

170

180

50

60

130

140

150

160

• sequence pointers not useful now! (but keep space for simplicity)

Note on inserts

• Say we insert record with key = 25

10

20

30 p=4

leaf

10

– 20 –

25

30

• Afterwards:

So, for B-trees:

• Each node has at most p tree pointers• Each node, except the root, has at least p/2 tree

pointers • The root node has at least two tree pointers, unless

it is the only node in the tree• All leaf nodes are at the same level. Leaf node has

the same structure as internal nodes except that all of their tree pointer Pi are null

So at leastp/2 - 1 keys

So at most p –1keys

Insertion Criterion

• Insert at the failure node, by searching the tree

• Insert at the right place, if the node becomes too full, that is, has p keys in it, then split

• To split, take the key at p/2 as the splitting point, take the k p/2 out, and insert it into its parent

• Splitting may propagate to the root

example

• Build a B-tree of order p =3. The values are inserted in the order 8, 5, 1, 7, 3, 12, 9, 6

result

More example

• Try p = 5 with the following key sequence23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15,

16, 20, 24, 28, 39

• Note: large p implies easy solution

Solution (may be wrong!)

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

Deletion in B-tree

• Deletion may make a node to be less than half full! That is, key no. = p/2 - 2

• Must redistribute keys ( or borrow keys)

• If cannot redistribute keys, perform coalescing

• Coalescing two nodes is ok! Number of keys in the merged node is equal to

p/2 - 2 + p/2 - 1 + 1 = 2 p/2 - 2 p -1≦

Key redistribution

f, x,g

y x

y

Coalesce

…,K i-1, ki, k i+1,…

f g

p/2 - 2 p/2 - 1

Delete sequence 15, 56,10, 74show the resulted tree

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

28

21 46,65

10,16,18,20 23,24 37,3948,56,59,60

71,74,78,92

Then delete 56, 10, 74

21, 28,46, 65

10, 16,18,20 23,24 37,3948,56,59,60

71,74,78,92

When the key is in a internal node

• Key transformation-

replace the key with a proper key in the leaf nodes, then delete the key in the leaf node

Delete key 46

28

16,21 46,65

10, 15 18,20 23,24 37,3948,56,59,60

71,74,78,92

48

Try more ! (try delete key 16)

28

16,21 48,65

10, 15 18,20 23,24 37,3956,59,60

71,74,78,92

Tradeoffs:

B-trees have faster lookup than B+trees

in B-tree, non-leaf & leaf different sizes

B+trees preferred!

But note:

• If blocks are fixed size(due to disk and buffering restrictions)

Then lookup for B+tree isactually better!!

Example:

- Pointers 4 bytes

- Keys 4 bytes

- Blocks 100 bytes (just an example)

- Look at full 2 level tree

Root has 8 keys + 8 record pointers+ 9 son pointers

= 8x4 + 8x4 + 9x4 = 100 bytes

B-tree:

Each of 9 sons: 12 rec. pointers (+12 keys)

= 12x(4+4) + 4 = 100 bytes

2-level B-tree, Max # records =

12x9 + 8 = 116

Root has 12 keys + 13 son pointers

= 12x4 + 13x4 = 100 bytes

B+tree:

Each of 13 sons: 12 rec. ptrs (+12 keys)

= 12x(4 +4) + 4 = 100 bytes

2-level B+tree, Max # records

= 13x12 = 156

So...

ooooooooooooo ooooooooo 156 records 108 records

Total = 116

B+ B

8 records

• Conclusion:– For fixed block size,– B+ tree is better because it is bushier

A more realistic example

EXAMPLE 6: To calculate the order p of a B+ -tree, suppose that the search key field is V= 9 bytes long, the block size is B = 512 bytes, a record pointer is P r = 7 bytes, and a blockpointer is P = 6 bytes, as in Example 4. An internal node of the B+-tree can have up to ptree pointers and p - 1 search field values; these must fit into a single block. Hence, wehave:(p * P) + ((p - 1) * V) B≦(p * 6) + ((p - 1) * 9) 512≦(15 * p) 521≦We can choose p to be the largest value satisfying the above inequality, which gives p= 34. This is larger than the value of 23 for the B-tree, resulting in a larger fan-out andmore entries in each internal node of a B+ -tree than in the corresponding B-tree. The leafnodes of the B+ -tree will have the same number of values and pointers, except that thepointers are data pointers and a next pointer. Hence, the order Pleat for the leaf nodes canbe calculated as follows:(Pleaf * (P r + V)) + P B≦(Pleaf* (7 + 9)) + 6 512≦(16 * Pleat) 506≦It follows that each leaf node can hold up to Pleaf = 31 key value/data pointer combinations, assuming

that the data pointers are record pointers.

EXAMPLE 7: Suppose that we construct a B+-tree on the field of Example 6. To calculate the approximate number of entries of the B+ -tree, we assume that each node is 69 percent full. On the average, each internal node will have 34 * 0.69 or approximately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold 0.69 * Pleaf = 0.69 * 31 or approximately 21 data record pointers. A B+-tree will have the following average number

of entries at each level:Root: 1 node 22 entries 23 pointersLevel l: 23 nodes 506 entr.ies 529 pointersLevel 2: 529 nodes 11,638 entries 12,167 pointersLeaf level: 12,167 nodes 255,507 record pointers

EXAMPLE 4: Suppose the search field is V = 9 bytes long, the disk block size is B = 512 bytes, a record (data) pointer is P r = 7 bytes, and a block pointer is P = 6 bytes. Each B-tree node can have at most p tree pointers, p - 1 data pointers, and p - 1 search key field values. These must fit into a single disk block if each B-tree node is to

correspond to a disk block. Hence, we must have: (p * P) + ((p - 1) * (P r + V)) B ≦(p * 6) + ((p - 1) * (7 + 9)) 512≦(22 * p) 528 ≦We can choose p to be a large value that satisfies the above inequality, which

gives p = 23 (p = 24 is not chosen because of the reasons given next).

EXAMPLE 5: Suppose that the search field of Example 4 is a nonordering key field, andwe construct a B-tree on this field. Assume that each node of the B-tree is 69 percent full.Each node, on the average, will have p * 0.69 = 23 * 0.69 or approximately 16 pointersand, hence, 15 search key field values. The average fan,out fo =16. We can start at theroot and see how many values and pointers can exist, on the average, at each subsequentlevel:Root: 1 node 15 entries 16 pointersLevel l: 16 nodes 240 entries 256 pointersLevel 2: 256 nodes 3840 entries 4096 pointersLevel 3: 4096 nodes 61,440 entriesAt each level, we calculated the number of entries by multiplying the total number ofpointers at the previous level by 15, the average number of entries in each node. Hence,for the given block size, pointer size, and search key field size, a two-level B-tree holds3840 + 240 + 15 = 4095 entries on the average; a three-level B-tree holds 65,535 entrieson the average.

comparison

• A three-level B+ tree holds up to 255,507 record pointers, on average. Compare to the 65,535 entries for the corresponding B-tree in Example 5.

Outline/summary

• Conventional Indexes• Sparse vs. dense

• Primary vs. secondary

• B trees• B+trees vs. B-trees

• B+trees vs. indexed sequential

• Hashing schemes --> Next

top related