fusion trees advanced data structures aris tentes

22
Fusion Trees Advanced Data Structures Aris Tentes

Upload: adriana-denner

Post on 16-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fusion Trees Advanced Data Structures Aris Tentes

Fusion Trees

Advanced Data Structures

Aris Tentes

Page 2: Fusion Trees Advanced Data Structures Aris Tentes

Goal

Fixed Universe Successor Problem We have a set of n numbers Each number has a length of at most log

u bits (u=size of the fixed Universe) We want to perform the following actions:

1. Predecessor/Successor

2. Insertion/Deletion

in time better than O(log n)

Page 3: Fusion Trees Advanced Data Structures Aris Tentes

Model

Transdichotomous RAM Memory is composed of words Each word has a length of w=log u Each item we store must fit in a word The following operations require constant time:

1. Addition, Subtraction

2. Multiplication, Division

3. AND, OR, XOR

4. left/right Shift

5. Comparison

Page 4: Fusion Trees Advanced Data Structures Aris Tentes

Main Idea

A fusion tree is a B-tree with fan-out and, therefore, has a height of

If we find a way to determine, where a query fits among the B keys of a node in constant time, then we have an solution to our problem

Page 5: Fusion Trees Advanced Data Structures Aris Tentes

In the Nodes

Suppose that the keys (K) in a node are If we view them in a binary tree then we have the following picture:

The black nodes are the branching nodes. For k keys, there are exactly k-1 branching nodes. However, some of them may be in the same level. Thus, less than k bits are required to distinguish the ‘s.

0 1 1... kx x x

ix

Page 6: Fusion Trees Advanced Data Structures Aris Tentes

We construct the set B(K) with the branching levels (namely the bit positions required to distinguish the keys)

Let with and

Def. : PerfectSketch(x)= the extracted bits according to B(K) of x. Namely, the bits of x, which correspond to the positions

If we collect the perfect sketches of all k keys, then we are able to reduce the node representation to k r-bit strings.

That means that bits would be efficient. Less than a word!!

0 1( ) { ,..., }rB S b b 0 1 1... rb b b 1 ( )r k B

0 1,..., rb b

2( )O B

Page 7: Fusion Trees Advanced Data Structures Aris Tentes

However, computing PerfectSketch(x) is difficult. Therefore, we compute an approximation, called Sketch(x).

Sketch(x) contains the same bits with PerfectSketch(x), in the same order with some extra 0’s in between, but in consistent positions.

This is done by multiplying x by a number m, which we will see later how we choose it.

Page 8: Fusion Trees Advanced Data Structures Aris Tentes

Firstly, we compute leaving only the bits which correspond to B(K).

If then we observe that

All we need is to find an m such that:1. All are distinct (no collisions)

2. (to preserve order)

3. are concentrated in a small range ( )

i jb m

4r

Page 9: Fusion Trees Advanced Data Structures Aris Tentes

If we find such an m, then we compute

which is long.

Note that k sketches fit in a word.

4 4 4/5( ) log ( )r O B n

Page 10: Fusion Trees Advanced Data Structures Aris Tentes

Can we find such an m? Firstly, we show how to find such

that whenever Suppose we have found with the

desired property. We observe that implies Thus we can choose to be the least residue

not represented among the fewer than residues of the form

Then, by adding suitable values of we obtain the final values of mi

' '1,..., rm m

' ' 3(mod )i h j gm b m b r i j' '1,..., tm m

' '1t h i gm b m b ' '

1t i g hm m b b

'1tm

3r

' ,1 ,1 ,i g hm b b i t g h r

3r

Page 11: Fusion Trees Advanced Data Structures Aris Tentes

The set of the sketched keys of a node is denoted by S(K)

Def.: We define the sketch of an entire node as follows:

Page 12: Fusion Trees Advanced Data Structures Aris Tentes

Lemma

Suppose y is an arbitrary number and xi an element of S (the set of keys). Let be the elements of B(S) and m-1 the most significant bit position in which PerfectSketch(y) and PerfectSketch(xi) differ.

Assume that p>bm is the most significant position in which y and xi differ.

Then the rank(y) in S is uniquely determined by the interval containing p and the relative order between y and xi.

1( , )j jb b

1 2 1... rb b b

Page 13: Fusion Trees Advanced Data Structures Aris Tentes

Using the previous lemma, we can reduce the computation of rank(y) in K to computing rank(Sketch(y)) in K(S).

Having computed rank(Sketch(y)), we have determined the predecessor and successor Sketch(xi) and Sketch(xi+1) of Sketch(y) in K(S).

If xi≤y≤xi+1, then we are done. Else we pick the one (from the sketched ones)

with the longest prefix of significant bits with Sketch(y) and apply the previous lemma.

Use of a look up table.

Page 14: Fusion Trees Advanced Data Structures Aris Tentes

Finding the rank(Sketch(y)) in S(K) Firstly, we compute

Then the substraction

And finally

Observing that.

Page 15: Fusion Trees Advanced Data Structures Aris Tentes

Suitable multiplication sums these ones and gives the desired rank.

What remains is to find a way to compute in constant time, the most significant bit, in which two numbers u,v differ.

We can easily see that this problem is reduced to the problem of finding the most significant bit of u XOR v.

We want to compute msb(x).

Page 16: Fusion Trees Advanced Data Structures Aris Tentes

Lemma

We call a number x d-sparse if the positions of its one bits belong to a set of the form Not all these positions have to be occupied by ones.

If x is d-sparse, then there exist constants y,y’, such that for z=(yx)ANDy’ the i’th bit of z equals the bit in the position of a+di of x. Namely, z is a perfect compression of x.

{ | 0 }Y a di i d

Page 17: Fusion Trees Advanced Data Structures Aris Tentes

At first consider a partitioning of the w bits of our word x into consecutive blocks of bits. The computation is divided into two phases.

1. We find the leftmost block containing a one and we extract this block

2. We find the leftmost one in this extracted block.

1s w

msb(x)

Page 18: Fusion Trees Advanced Data Structures Aris Tentes

Let be the number, which has ones precisely in the leftmost position of each block, namely and

We compute lead(x)= the leftmost bit of each block is one iff x contains a one in this block. It is given by

We observe that lead(x) is d-sparse, so we can apply the previous lemma and obtain compress(x).

1C1

12

s is

i

2 1C C

1 1 2 1 1( [( AND )AND ])OR( AND )C C x C C x C

First Phase

Page 19: Fusion Trees Advanced Data Structures Aris Tentes

Let be the set of the first b/s powers of two.

We compute b’=rank(compress(x)) in P, in the same way as before.

Note that b’ identifies the block number (counting from the right ) of the leftmost block of x containing a one.

0 / 1{2 ,..., 2 }b sP

Page 20: Fusion Trees Advanced Data Structures Aris Tentes

The position of the most significant one in lead(x) is f=sb’

To extract the desired block we multiply by and right justify the significant portion.

12b f s

Page 21: Fusion Trees Advanced Data Structures Aris Tentes

Second Phase

We want to find the position of the leftmost one in the extracted block.

As before, we do a rank computation of these s bits with the first s powers of two.

Now we have all the information needed to compute msb(x)

Page 22: Fusion Trees Advanced Data Structures Aris Tentes

Conclusions

In the static case, the problem of successor and predecessor, is clear to be solvable in time, since this is the height of our B-tree and the computation in each node requires constant time (the data we need is precomputed)

In the dynamic case, the total time to update a node is

The amortized time for insertion/deletion in a B-tree is constant.Therefore, sorting requires