data structures haim kaplan and uri zwick november 2012 lecture 5 b-trees

69
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees

Upload: scott-manning

Post on 17-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Data Structures

Haim Kaplan and Uri ZwickNovember 2012

Lecture 5

B-Trees

10 25 42

key < 10 10 < key < 25 25 < key < 42 42 < key

A 4-node

3 keys

4-way branch

k0 kr−3 kr−2

An r-node

r−1 keys

r-way branch

k1 k2…

c0 c1 c2 cr−2 cr−1

B-Trees (with minimum degree d)

Each node holds between d−1 and 2d −1 keys

Each non-leaf node has between d and 2d children

The root is special:has between 1 and 2d −1 keys

and between 2 and 2d children (if not a leaf)

All leaves are at the same depth

A 2-4 tree

15 28

14

13

1 3 30 40 5016 17

4 6 10

5 7 11

B-Tree with minimal degree d=2

Node structure

r – the degree

key[0],…key[r−2] – the keys

k0 kr-3 kr-2k1 k2…

c0 c1 c2 cr−2 cr−1

child[0],…child[r−1] – the children

leaf – is the node a leaf?

Possibly a different representation for leafs

item[0],…item[r−2] – the associated items

The height of B-Trees

• At depth 1 we have at least 2 nodes• At depth 2 we have at least 2d nodes• At depth 3 we have at least 2d2 nodes

• …

• At depth h we have at least 2dh−1 nodes

1 g2( lo1) hd

hn d d d h n

Number of nodes accessed - logdn Number of operations – O(d logdn)

Number of ops with binary search – O(log2d logdn) = O(log2n)

Look for k in node x

Look for k in the subtree of node x

B-Trees vs binary search trees

• Wider and shallower• Access less nodes during search• But may take more operations

B-Trees – What are they good for?

The hardware structure

CPU

RAMDisk

Cache

Each memory-level much larger but much slower

Information moved in blocks

A simplified I/O model

CPU

RAMDisk

Each block is of size m.

Count both operations and I/O operations

Data structures in the I/O model

Linked list and search trees behave poorly in the I/O model.

Each pointer followed may cause a disk access

Pick d such that a node fits in a block B-trees reduce the worst case # of I/Os

Each node (struct) is allocated continuously.Harder to control the disk blocks containing

different nodes

Number of nodes accessed - logdn Number of operations – O(d logdn)

Number of ops with binary search – O(log2d logdn) = O(log2n)

Look for k in node x

Look for k in the subtree of node x

I/Os

Red-Black Trees vs. B-Trees

n = 230 109

30 ≤ height of Red-Black Tree ≤ 60

Up to 60 pages read from disk

Height of B-Tree with d=1000 is only 3

Each B-Tree node resides in a block/page

Only 3 (or 4) pages read from disk

Disk access 1 millisecond (10-3 sec)

Memory access 100 nanosecond (10-7 sec)

B-Trees – What are they good for?

• Large degree B-trees are used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block.

• Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.

• B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

Updates to a B-tree

A

B

B

A

Rotate/Steal right

Rotate/Steal leftNumber of I/Os – O(1) Number of operations – O(d)

Split

A

C

B

d−1 d−1

A C B

d−1

d−1

JoinNumber of I/Os – O(1) Number of operations – O(d)

Insert

14

13

1 3 30 40 5016 176 11

5 10

Insert(T,2)

15 28

Insert

13

6 11

5 10

Insert(T,2)

1 2 3 14 30 40 5016 17

15 28

Insert

13

6 11

5 10

1 2 3

Insert(T,4)

14 30 40 5016 17

15 28

Insert

13

6 11

5 10

Insert(T,4)

14 30 40 5016 17

15 28

1 2 3 4

Split

13

6 11

5 10

Insert(T,4)

1 2 3 4 14 30 40 5016 17

15 28

Split

13

6 11

5 10

Insert(T,4)

3 41

2

14 30 40 5016 17

15 28

Split

13

6 11

Insert(T,4)

3 4

2 5 10

1 14 30 40 5016 17

15 28

Splitting an overflowing node

A

C

B

d d−1

A C B

d−1d

13

6 11

Insert(T,7)

3 4

Another insert

1

2 5 10

14 30 40 5016 17

15 28

13

11

Insert(T,7)

3 4

2 5 10

Another insert

1 14 30 40 5016 17

15 28

6 7

13

11

Insert(T,8)

3 4

2 5 10

and another insert

1 6 7 14 30 40 5016 17

15 28

13

11

Insert(T,8)

3 4

2 5 10

and another insert

1 6 7 8 14 30 40 5016 17

15 28

13

11

Insert(T,9)

3 4

2 5 10

6 7 8 9

and the last for today

1 14 30 40 5016 17

15 28

Split

13

11

Insert(T,9)

3 4

2 5 10

8 9 14 30 40 5016 17

15 28

6

71

Split

13

11

Insert(T,9)

3 4 8 9

2 5 7 10

14 30 40 5016 17

15 28

61

Split

13

11

Insert(T,9)

3 4 8 9 14 30 40 5016 17

15 28

61

7 102

5

Split

11

Insert(T,9)

3 4 8 9 14 30 40 5016 17

15 28

61

7 102

5 13

Insert – Bottom up

Find the insertion point by a downward searchInsert the key in the appropriate place

If the current node is overflowing, split itIf its parent is now overflowing, split it, etc.

Disadvantages:Need both a downward scan and an upward scan

Need to keep parents on a stackNodes are temporarily overflowing

Insert – Top down

While conducting the search,split full children on the search path

before descending to them!

When the appropriate leaf it reached,it is not full, so the new key may be added!

Split-Root(T)

C

d−1 d−1

C

d−1

d−1

T.root

T.root

Split-Child(x,i)

A

C

B

d−1 d−1

A C B

d−1

d−1

key[i]x

x.child[i]

key[i]x

x.child[i]

Insert – Top downWhile conducting the search,

split full children on the search pathbefore descending to them!

Number of I/Os – O(logdn)Number of operations – O(d logdn)

Deletions from B-Trees

22 28

20 30 40 5024 2614

delete(T,26)

1 2 4 6 8 9 11 12

10 13

7 15

3

Delete

22 28

20 30 40 5014

delete(T,26)

1 2 4 6 8 9 11 12

10 13

7 15

3

24

Delete

22 28

20 30 40 5014

delete(T,13)

1 2 4 6 8 9 11 12

10 13

7 15

3

24

Delete (Replace with predecessor)

22 28

20 30 40 5014

delete(T,13)

1 2 4 6 8 9 11 12

10 12

7 15

3

24

Delete

22 28

20 30 40 5014

delete(T,13)

1 2 4 6 8 9

10 12

7 15

3

2411

Delete

22 28

20 30 40 5014

delete(T,24)

1 2 4 6 8 9

10 12

7 15

3

2411

Delete

22 28

20 30 40 5014

delete(T,24)

1 2 4 6 8 9

10 12

7 15

3

11

Delete (steal from sibling)

22 30

2014

delete(T,24)

1 2 4 6 8 9

10 12

7 15

3

2811 40 50

A

B

B

A

Rotate/Steal right

Rotate/Steal left

Delete

22 30

2014

delete(T,20)

1 2 4 6 8 9

10 12

7 15

3

2811 40 50

Delete

22 30

14

delete(T,20)

1 2 4 6 8 9

10 12

7 15

3

2811 40 50

Delete (Join)

14

delete(T,20)

1 2 4 6 8 9

10 12

7 15

3

11 40 5022 28

30

Few more..

14

delete(T,22)

1 2 4 6 8 9

10 12

7 15

3

11 40 5022 28

30

Few more..

14

delete(T,22)

1 2 4 6 8 9

10 12

7 15

3

11 40 50

30

28

Few more..

14

delete(T,28)

1 2 4 6 8 9

10 12

7 15

3

11 40 50

30

28

Few more..

14

delete(T,28)

1 2 4 6 8 9

10 12

7 15

3

11 40 50

30

Stealing again

14

delete(T,28)

1 2 4 6 8 9

10 12

7 15

3

11

40

30 50

Another one

14

delete(T,30)

1 2 4 6 8 9

10 12

7 15

3

11

40

30 50

Another one

14

delete(30,T)

1 2 4 6 8 9

10 12

7 15

3

11

40

50

After Join

14

delete(30,T)

1 2 4 6 8 9

10 12

7 15

3

11 40 50

Now we can steal

14

delete(30,T)

1 2 4 6 8 9

10 12

7 15

3

11 40 50

Now we can steal

14

delete(30,T)

1 2 4 6 8 9

7 12

3

11

15

40 50

10

More ?

14

delete(40,T)

1 2 4 6 8 9

7 12

3

11

15

40 50

10

Delete – Top down

While conducting the search,make sure that each child descended into

contains at least d keys

How?Steal or join

Assume, at first, that the item to be deleted is in a leaf

When the item is located, it resides in a leaf containing at least d keys, so it can be removed

Delete – Top downWhile conducting the search,

make sure that each child you descend to contains at least d keys

d−1 dRotate! (Steal)

d−1 d−1

Join!

Delete – Top down

What if the item to be deleted is in an internal node?

Descend as before from the root untilthe item to be deleted is located

Keep a pointer to the node containing the item

Carry on descending towards the successor, making sure that nodes contain at least d keys

When the successor is found, delete it from its leafand use it to replace the item to be deleted

Deletions from B-Trees

As always, similar, but slightly more complicated than insertions

(may need to replace with successor)

Deletion is slightly simpler for B+-Trees

B-Trees vs. B+-Trees

• In a B-tree each node contains items and keys• In a B+-tree leaves contain items and keys.

Internal nodes contain keys to direct the search. • Keys in internal nodes are either keys of existing

items, or keys of items that were deleted.

• Internal nodes may contain more keys so overall the # of items we can store increases