cache-oblivious b-trees michael a. bender, erik d. demaine, martin farach-colton. presented by: itai...

50
Cache-Oblivious B- Trees Michael A. Bender, Erik D. Demaine, Martin Farach-Colton. Presented by: Itai Lahan

Post on 19-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Cache-Oblivious B-TreesMichael A. Bender, Erik D. Demaine, Martin Farach-Colton.

Presented by: Itai Lahan

B-Trees – A reminder

Balanced search tree All leaves have the same depth All nodes have degree at most B

The root has degree at least two All internal nodes have degree at least B/2

[R. Bayer and E. M. McCreight. 1972]

O(B)

O(logBN)

B-Trees - example

11

5 8 16 21

1 2 3 4 6 7 9 10 12 13 14 15 17 18 19 20 22 23 24 25

Searching for 18 11

5 8 16 21

1 2 3 4 6 7 9 10 12 13 14 15 17 18 19 20 22 23 24 2518

B-Trees

Balanced search tree Fan-out is proportional to memory block

size A single block read determines the next

node Classical two-level memory model solution

[R. Bayer and E. M. McCreight. 1972]

B

B B B

B B B B BB B

B-Trees in multilevel memory hierarchy?

B1 > B2 > B3 > … > Bk block sizes between memory levels

B1

B1 B1 B1

B1 B1 B1 B1 B1B1 B1

B2 B2 B2 B2

B3 B3 B3 B3 B3 B3 B3 B3

B1

B-Trees in multilevel memory hierarchy?

B1 > B2 > B3 > … > Bk block sizes between memory levels

B1

B1 B1 B1

B1 B1 B1 B1 B1B1 B1

B2

B2 B2 B2

B1

B3

B3 B3 B3

B2

?

Cache Oblivious Algorithms

Reason about a simple two level memory

Prove results about an unknown multilevel memory

Cache oblivious B-Tree ? How do we achieve O(logBN) without knowing

B ?? It’s all about the memory layout…

No variable depends on hardware parameters

Van Emde Boas Layout[P. van Emde Boas, R. Kaas, and E. Zijlstra. 1977]

Van Emde Boas Layout[P. van Emde Boas, R. Kaas, and E. Zijlstra. 1977]

Van Emde Boas Layout[P. van Emde Boas, R. Kaas, and E. Zijlstra. 1977]

A B1 B2 B3 Bk-1 BkB4

BkB1

A

Recursive structures are

stored contiguously

Recursive structures are

stored contiguously

Van Emde Boas Layout[P. van Emde Boas, R. Kaas, and E. Zijlstra. 1977]

B11 B1

k

A2

B21 B2

k

Ak

Bk1 Bk

k

A1

A1 B11 B1

k B2k AkA2 B2

1 Bk1 Bk

k

11

22 33

Van Emde Boas Layout

7722

11

Van Emde Boas Layout

33 44 55 66 88 99 1010 1111

Van Emde Boas Layout

5

6 7

8

9 10

11

12 13

14

15 16

2

3 4

20

21 22

23

24 25

26

27 28

29

30 31

17

18 19

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Van Emde Boas Layout

Van Emde Boas Layout

Van Emde Boas Layout

Van Emde Boas Layout

Van Emde Boas Layout

1

2 3

74

5 6

10

9

19

8

1211

13

171514 2318

16

24

25

2726

20 21

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Van Emde Boas Layout

1

2 3

74

5 6

10

9

19

8

1211

13

171514 2318

16

24

25

2726

20 21

22

Searching with memory transfer

block of size 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Van Emde Boas Layout

1

2 3

74

5 6

10

9

19

8

1211

13

171514 2318

16

24

25

2726

20 21

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Searching with memory transfer

block of size 9

2 cache misses

Van Emde Boas Layout

1

2 3

74

5 6

10

9

19

8

1211

13

171514 2318

16

24

25

2726

20 21

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Searching with memory transfer

block of size 3 4 cache misses

Van Emde Boas Layout

1

2 3

74

5 6

10

9

19

8

1211

13

171514 2318

16

24

25

2726

20 21

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

and with L1 memory transfer

block of size 3

Searching with L2 memory transfer

block of size 9

4 L1 cache misses

2 L2 cache misses

Static Cache Oblivious Tree

LogB

#IOs per search? = NOB

NO Blog

log

log

[Prokop, 1999]

From Static to Dynamic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Static is fine, but… A Dynamic B-Tree should allow the addition and removal of nodes

How do we add an element efficiently to an ordered array ?

Dynamic B-Tree – restating the problem

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

How do we add an element efficiently to an ordered array ?

10.510.5Trivial, O(N/B) memory transfers

Can we do better?

Contradictory goals – nodes should be packed densely for locality of reference

Guideline - Distribute elements evenly in the array Store any set of k contiguous elements xi1,…, xik in a

contiguous subarray of size O(k)

Dynamic B-Tree – restating the problem

x1 x2 x3 x4 x5 xN

General idea – Array with enough extra space between nodes for future insertions

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx’2x’2

Weight = #Data/Space

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx’2x’2

2/4 0/4

2/8

3/16

6/32

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx2x’2x3

!

3/4 0/4

3/8

4/16

7/32

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx2x’2x3

3/8

4/16

7/32

3/4 0/4

x2 x’2 x3

2/4 1/4

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx2 x’2 x3

Size of the interval to rebalance ? Thresholds for interval over/underflows ?

Scanning a k consecutive elements k Elements are stored in O(k) memory cells O(1+k/B) memory transfers

Inserting/deleting an element Moves O(log2N) amortized consecutive cells O(1 + log2N / B) amortized memory transfers

Packed Memory Management[A. Itai, A. G. Konheim, and M. Rodeh 1981]

x1 x2 x3 x4 x5 x6

1 Nx2 x’2 x3

Putting it all together… ?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 2 3 5 7 8 9 11 12 13 144 6 10 15 16 17 18 19 20

Putting it all together… ?

1 2 3 5 7 8 9 11 12 13 144 6 10 15

8’8’

8’ 9 109 108 8’ 9 108

It’s still a tree !

1 2 3 5 7 8 9 11 12 13 144 6 10 15

Insertion moves O(log2N) amortized nodes Requires O(log2N) back pointers updates And what about the van Emde layout ?

8’8’

8’ 9 1033 77

Splits and Merges

Splits and Merges

A B1 Bi+1 Bk

B1

A

Bi BkBi+1

Bi

Splits and Merges

A

B1 Bi BkBi+1

B1 Bi+1 BkBiA

A’ A’’

A’ A’’

Splits and Merges

B1 Bi BkBi+1

Bi+1 BkA

A’ A’’

A’ A’’ B1 BiA’’B1 Bi

Splits and Merges

B1 Bi BkBi+1

Bi+1 BkA

A’ A’’

A’ A’’ B1 BiA’’B1 Bi

A

B1 Bi BkBi+1

B1 Bi+1 BkBiA

A’ A’’

A’ A’’

Each update requires O(1 + logN/B) amortized memory transfers

Search O(logBN) ScanO(1 + k/B)

Putting it all together… again

x1 x2 x3 x4 x5 x6

Packed Memory MaintenancePacked Memory Maintenance

1 2 3 4 5 6 7 8 91011121314151617181920

van Emde Boas layoutvan Emde Boas layout

11

5 8 16 21

1 2 3 4 6 7 9 10 12 13 14 15 17 18 19 20 22 23 24 25

5 8

1 2 3 4 6 7 9 10 12 13 14 15 22 23 24 25

Weight-balanced B-treeWeight-balanced B-tree

Update O(logBN + log2N) Search for key O(logBN) Create/delete leaf O(1) Rearange packed array

O(1+log2N/B) Update back pointers

O(1+log2N) Split/Merge nodes

O(1+logN/B)

Saving on back pointer updates ?

1 2 3 5 7 8 9 11 12 13 144 6 10 15

How can we reduce back pointer updates ?

8’8’

8’ 9 1033 77

Saving on back pointer updates ?

Idea – pack groups of leaves together beneath a single parent

Saving on back pointer updates ?

Idea – pack groups of leaves together beneath a single parent

logN logN logN logN

Indirection

Idea – pack groups of leaves together beneath a single parent

logN logN logN logN

Indirection

Idea – pack groups of leaves together beneath a single parent

logN logN logNlogN

Indirection

Searching for an element ?

logN logN logNlogN

O(logBN)

Asymptotically the same ! But we’ve delayed pointer updates by

O(logN)

O(logN / B)

Final build – two levels of indirection

TT

L1L1

Search O(logBN)

ScanO(1 + k/B) Updates

O(logBN + log2N / B)

L0L0L0 contains all the values

The end…Presented by: Itai Lahan