cs4432: database systems ii b-tree index structure 1

55
CS4432: Database Systems II B-Tree Index Structure 1

Upload: sabina-merritt

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS4432: Database Systems II B-Tree Index Structure 1

CS4432: Database Systems II

B-Tree Index Structure

1

Page 2: CS4432: Database Systems II B-Tree Index Structure 1

End-User View

To speedup queries We create indexes

2

Create Table R ( Id Number, name varchar2(100), …);

Create Table R ( Id Number, name varchar2(100), …);

Select ID, nameFrom RWhere ID = 101;

Select ID, nameFrom RWhere ID = 101;

Select ID, nameFrom RWhere name = ‘Mike’;

Select ID, nameFrom RWhere name = ‘Mike’;

> Create Index R_ID on R (ID);

> Create Index R_Name on R (name);

Page 3: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree Index: Why

• Multi-Level Index seems an efficient approach

• But– How many levels to create?– How to grow and shrink efficiently &

dynamically?– How to handle equality search & range search

efficiently?

3

B-Tree Index is a specialized multi-level tree index to address these questions

Page 4: CS4432: Database Systems II B-Tree Index Structure 1

What Is B-Tree• A disk-based multi-level balanced tree index structure

4

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

• Each node in the tree is a disk page Has to fit in one disk page• The pointers are locations on disk either block Ids or record Ids

Page 5: CS4432: Database Systems II B-Tree Index Structure 1

• Has one root (can be the only level)• The height of the tree grows and shrinks dynamically as needed• We always (in search, insert, delete) start from the root and move

down one level at a time

What Is B-Tree• A disk-based multi-level balanced tree index structure

5

One root

Internal nodes (can be many levels)

Leaf nodes (one leaf level)

Page 6: CS4432: Database Systems II B-Tree Index Structure 1

• All leaf nodes are at the same height

What Is B-Tree• A disk-based multi-level balanced tree index structure

6

Height 2

Height 3

Page 7: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree Characteristics• Each node holds N keys & N+1 pointers• Keys in any node are sorted (left to right)• Each node is at least 50% full (Minimum 50% occupancy)

– Except the Root can have as minimum (1 key & 2 pointers)

• No spaces in the middle• Number of pointers in any node (even not full) = number of keys + 1

– The rest are Null

7

N = 4

Page 8: CS4432: Database Systems II B-Tree Index Structure 1

BB++-Tree Non-Leaf Node Structure-Tree Non-Leaf Node Structure

< 13

13≤k<17 17≤k<24 24≤k<30 30≤k

Pointers to next level(These are N+1 pointers)

8

Page 9: CS4432: Database Systems II B-Tree Index Structure 1

BB++-Tree Leaf Node Structure-Tree Leaf Node Structure

Pointer to next leaf node on right57 81 95

To record with key 57

To record with key 81

To record with key 95

< 13

13≤k<17 17≤k<24 24≤k<30 30≤k

These are the leaf nodes

9

Page 10: CS4432: Database Systems II B-Tree Index Structure 1

Leaf Nodes in B-TreeLeaf Nodes in B-Tree

• For each entry i in a leaf node:– pointer Pi points to a file record with search-key value Ki.

• Pn points to next leaf node in search-key order

Properties of a leaf node:

10

Page 11: CS4432: Database Systems II B-Tree Index Structure 1

11

In textbook’s notation N=3

Leaf:

Non-leaf:

Pointers to data records

Pointer to next leaf

page

Pointers child B-Tree nodes

Page 12: CS4432: Database Systems II B-Tree Index Structure 1

Good Utilization

• B-tree nodes should not be too empty

• Each node is at least 50% full (Minimum 50% occupancy) – Except the Root can have as minimum (1 key & 2 pointers)

12

• Use at leastNon-leaf: (n+1)/2 pointersLeaf: (n+1)/2 pointers to

data

Page 13: CS4432: Database Systems II B-Tree Index Structure 1

13

Number of pointers/keys for B-Tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 2 1

(n+1)/2 (n+1)/2

Max #Pointer

s

Max #Key

Min#Pointer

s

Min#Key

Page 14: CS4432: Database Systems II B-Tree Index Structure 1

Dense vs. Sparse• Leaf level can be either dense or sparse

– Sparse only if the data file is sorted• In most DBMS, the leaf level is always dense

– One index entry for each data record

• What about values in non-leaf levels– Subset set of the leaf values

14

How these internal subset is selected??

How these internal subset is selected??

Page 15: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree in Practice• Assume a disk block of 4k bytes = 4096• Assume indexing integer keys (4 bytes) & each pointer is 8 bytes

• How many entries N can fit in one B-tree node– 4N + 8(N+1) = 4096 N = 340

• A 3-Level B-Tree (root + 1st and 2nd levels) can index how many records (On Average)

– First level (root) 1 Node [Max=340, min= 170, avg=255]– 2nd level 255 Nodes [each one has average 255 pointer]– 3rd level 2552 = 65025 Nodes {leaf nodes, each has 255 on

average]– So Avg number of indexed records is = 2553 = 16.6 x 106 records

15

Page 16: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree in Practice

• Usually 3- or 4-level B-tree index is enough to index even large files

• The height of the tree is usually very small– Good for searching, insertion, and deletion

16

Page 17: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree Querying

17

Page 18: CS4432: Database Systems II B-Tree Index Structure 1

Queries on B-TreesQueries on B-Trees• Equality Search: Find all records with a search-key value of k.

1. Start with the root node

• Examine the keys (use binary search) until find a pointer to follow

2. Repeat until reach a leaf node

3. Search the keys in the leaf node for the first key = k

4. Move right until hit a key larger than k (may move from one node to

another node)

5. Follow the pointers to the data records

18

Page 19: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree: Equality Search

Find key = 0, 2, 28, 101 ?0 N1, N2, N4

2 N1, N2, N4 28 N1, N3, N8 101 N1, N3, N9

11

22 33

44 55 66 77 88 99

19

Page 20: CS4432: Database Systems II B-Tree Index Structure 1

Queries on B-TreesQueries on B-Trees• Range Search: Find all records between [k1, k2].

1. Start with the root node and search for k1

• Examine the keys (use binary search) until find a pointer to follow

2. Repeat until reach a leaf node

3. Search the keys in the leaf node for the first key = k1 or the first

larger

4. Move right until as long as the key is not larger than k2 (may move

from one node to another node)

5. Follow the pointers to the data records

20

Page 21: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree: Range Search

Find key in [5, 20], [4, 16], [100, 200]? [5, 20] N1, N2, N5, N6, N7 [4, 16] N1, N2, N4, N5, N6 [100, 200] N1, N3, N9

11

22 33

44 55 66 77 88 99

21

Page 22: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree Insertion

22

Page 23: CS4432: Database Systems II B-Tree Index Structure 1

Inserting a Data Entry into a B-Tree

• Find correct leaf L. (Searching) • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

23

Page 24: CS4432: Database Systems II B-Tree Index Structure 1

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

5* 7* 22* 28* 45* 77*

Updates on B-Trees: InsertionUpdates on B-Trees: Insertion

Insert 23

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

5* 7* 22* 28* 45* 77*23*

This is the easy case!

24

Page 25: CS4432: Database Systems II B-Tree Index Structure 1

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

5* 7* 22* 28* 45* 77*

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

Insert 8

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

22* 28* 45* 77*5* 7* 8*

5

Because the insertion will cause overfill, we split the leaf node into two nodes, we split the data into two nodes (and distribute the data evenly between them). “5” is special, since it

discriminates between the two new siblings, so it is copied up. We now need to insert 5 into the parent node…

Notice:5 is copied (it still

exists in leaf)

25

Page 26: CS4432: Database Systems II B-Tree Index Structure 1

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

5* 7* 22* 28* 45* 77*

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

Insert 8

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

22* 28* 45* 77*5* 7* 8*

5

Because the insertion will cause overfill, we split the leaf node into two nodes, we split the data into two nodes (and distribute the data evenly between them). “5” is special, since it

discriminates between the two new siblings, so it is copied up. We now need to insert 5 into the parent node…

Notice:Splitting guarantees each

node is 50% full

26

Page 27: CS4432: Database Systems II B-Tree Index Structure 1

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

22* 28* 45* 77*5* 7* 8*

5

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17

We now need to insert 5 into the parent node…

Because the insertion will cause overfill, we split the node into two nodes, we split the data into two nodes. “17” is special, since it discriminates between the two new siblings, so it is pushed up.

Notice:17 is pushed up (it

is taken out from the lower level)

27

Page 28: CS4432: Database Systems II B-Tree Index Structure 1

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

2* 3* 19* 20*

1713

16*14* 24* 27* 40* 41*

24 30

22* 28* 45* 77*5* 7* 8*

5

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17

We now need to insert 5 into the parent node…

Because the insertion will cause overfill, we split the node into two nodes, we split the data into two nodes. “17” is special, since it discriminates between the two new siblings, so it is pushed up.

Notice:Splitting guarantees each

node is 50% full

28

Page 29: CS4432: Database Systems II B-Tree Index Structure 1

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17 The insertion of 8 has increased the height of the tree by one (this is rare).

New root

29

Page 30: CS4432: Database Systems II B-Tree Index Structure 1

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17 The insertion of 8 has increased the height of the tree by one (this is rare).

Notice:Root has the minimum

occupancy now…

30

Page 31: CS4432: Database Systems II B-Tree Index Structure 1

Updates on BUpdates on B++-Trees: Insertion-Trees: Insertion

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17

2* 3* 19* 20*

3024

16*14* 24* 27* 40* 41*22* 28* 45* 77*5* 7* 8*

135

17Notice:

After the split the tree is still balanced

31

Page 32: CS4432: Database Systems II B-Tree Index Structure 1

Leaf vs. Non-Leaf Page Split (from previous example of inserting “8”)

5

Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

2* 3* 5* 7* 8* …Leaf Page Split

2* 3* 5* 7* 8*

5 24 3013

appears once. Contrast17

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)

17 24 3013Non-LeafPage Split

5

32

Page 33: CS4432: Database Systems II B-Tree Index Structure 1

Back to Insertion Algorithm

• Find correct leaf L. (Searching) • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

33

Page 34: CS4432: Database Systems II B-Tree Index Structure 1

Exercise

• Insert key 1 ?• Insert key 50 ?

34

Page 35: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree: Another Insert Example

Each node has at most 2 keys (N= 2)

35

Page 36: CS4432: Database Systems II B-Tree Index Structure 1

B+-tree: Another Insert Example

36

Page 37: CS4432: Database Systems II B-Tree Index Structure 1

B+-tree: Another Insert Example

37

Page 38: CS4432: Database Systems II B-Tree Index Structure 1

Insertion Done!

38

Page 39: CS4432: Database Systems II B-Tree Index Structure 1

B-Tree Deletion

39

Page 40: CS4432: Database Systems II B-Tree Index Structure 1

Deleting a Data Entry from a B+ Tree

• Start at root, find leaf L where entry belongs. (Searching)

• Remove the entry.– If L is at least half-full, done! – If L has only the minimum entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

40

Page 41: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: Delete 19*

• Delete 19* is easy. Why?– The leaf node has more than the minimum

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

41

Page 42: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: After 19* was deleted

• What else did you observe?– Very important in practice…

• The other keys have shifted…– Remember: empty spaces should be at the end only

2* 3*

Root

17

24 30

14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

42

Page 43: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: Delete 20*

• After the deletion the 4th node has below the minimum occupancy !!!

• Need to re-distribute! How?

2* 3*

Root

17

24 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

43

Page 44: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: Delete 20*

• Need to re-distribute! How?– Check either of the two siblings– Can you borrow from either of them

without violating the min occupancy requirements?

2* 3*

Root

17

24 30

14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

44

Page 45: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: Delete 20*

• Need to re-distribute! How?– Copy 24* into the sibling node (page).

2* 3*

Root

17

24 30

14* 16* 22*24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Are we done??Are we done??

45

Page 46: CS4432: Database Systems II B-Tree Index Structure 1

Example Tree: Delete 20*

• Need to re-distribute! How?– Copy 24* into the sibling node (page).– Copy key 27 into the parent node (page).

2* 3*

Root

17

27 30

14* 16* 22*24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

46

Page 47: CS4432: Database Systems II B-Tree Index Structure 1

Deleting 19* and 20* ...

• Notice how middle key is copied up. What else have we done?

Record organization in a page matters for a B+-tree!

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

47

Page 48: CS4432: Database Systems II B-Tree Index Structure 1

Deleting 24* ...

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

• Borrow from a sibling will not work !!!

48

Page 49: CS4432: Database Systems II B-Tree Index Structure 1

• Must merge.30

22* 27* 29* 33* 34* 38* 39*

Deleting 24* ...

49

Page 50: CS4432: Database Systems II B-Tree Index Structure 1

• Must merge. 30

22* 27* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

Deleting 24* ...

50

Page 51: CS4432: Database Systems II B-Tree Index Structure 1

Deleting 24* ...

Notice:When merging internal

nodes…you get back the key that you passed to the parent

51

Page 52: CS4432: Database Systems II B-Tree Index Structure 1

Deleting 24* ...

Notice:Deletion still keep the

tree balanced

Notice:The tree height may be

reduced

52

Page 53: CS4432: Database Systems II B-Tree Index Structure 1

Back to Deletion Algorithm

• Start at root, find leaf L where entry belongs. (Searching)

• Remove the entry.– If L is at least half-full, done! – If L has only the minimum entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

53

Page 54: CS4432: Database Systems II B-Tree Index Structure 1

More Practical Deletion:Deleting 24* ...

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

• When deleting 24*, allow its node to be under utilized• After some time with more insertions

– Most probably more entries will be added to this node

• If many nodes become under utilized Re-build the index

54

Page 55: CS4432: Database Systems II B-Tree Index Structure 1

Cost of B-Tree OperationsCost of B-Tree Operations

What is the cost? How many I/Os to answer the query?

What is the cost? How many I/Os to answer the query?

55