cs213d data structures and algorithms binary search treescs213d/bst.pdf · avl condition: maths...

28
CS213d Data Structures and Algorithms Binary Search Trees Milind Sohoni IIT Bombay and IIT Dharwad April 11, 2017 1 / 28

Upload: others

Post on 08-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

CS213d Data Structures and AlgorithmsBinary Search Trees

Milind SohoniIIT Bombay and IIT Dharwad

April 11, 2017 1 / 28

Page 2: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

A typical requirementSet S ⊆ U where U is a total-order.Example: Strings under lexico-graphical order. Integers under ≤.

Insert: Add an element S = S ∪ {x}.Find: Answer if x ∈ S .

Delete: Delete an element x , i.e., S = S − x .

Typical Implementations.

Sorted Array A. If |A| = n, then insert, delete take O(n)operations but find will take O(log(n)) operations.

What is O(f (n))? .

Actual number of operations T (n) such thatc1 · f (n) ≤ T (n) ≤ c1 · f (n). Different programmers anddifferent machines will yield different results.

Do the constants matter? YES and NO. For small devices suchas mobile-phones, satellite systems, it matters. For mostapplications 100 · n << 5n log(n) << 0.01 · n2.

April 11, 2017 2 / 28

Page 3: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Linked lists

4 8 99

head

Analysis

insert O(n) Travel along the list todelete O(n) locate the right place.find O(n) Once located, operation is easy.

Even worse then sorted array. Reason: inability to access elements atrandom.Where are linked lists good?: addition/deletion at either ends, i.e.,queues and stacks.

April 11, 2017 3 / 28

Page 4: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Heaps

Analysis

insert O(log(n))delete O(log(n))find O(n)

1

2 3

4 39

78 5

5

Delete: Similar to delmin.

Find: Still hard.

Tree makes Insert, delete easy but find is hard. Opposite tosorted array.

Is there any structure which has the advantages of both?I.e.,have an order but retain ease of addition/deletion?

April 11, 2017 4 / 28

Page 5: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

BST Definition

Binary Search Tree: A binary tree with entries in U such that thein-order traversal is in ascending order.

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

Note that structure need not be a heap-tree.

Given a structure of size n and an ascending array of the samesize, location of each element gets fixed.

April 11, 2017 5 / 28

Page 6: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Finding recursively

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

Great News: Finding v : Compare with the root e(T ).

v = e(T ) Donev < e(T ) Look for v ∈ TL, if none, answer NOv > e(T ) Look for v ∈ TR , if none, answer NO

April 11, 2017 6 / 28

Page 7: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

DifferentHow many operations does it take (in terms of number of elements)?

Operations: comparisons and pointer-chasing.

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

element t1 t2 element t1 t2

¡1 4 4 2 3 34.5 4 4 5 1 15.5 4 2 7.5 3 5

April 11, 2017 7 / 28

Page 8: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Find and insertvoid insert(pvertex &*T, int val)

if (val < T->vdata)

{ if (T->cindex[0]!=NULL)

{insert(T->cindex[0],val)}

else

{new w; w->vdata=val;

T->cindex[0]=w;};

};// done with left

if (val >=T->vdata)

{ if (T->cindex[1]!=NULL)

{insert(T->cindex[1],val)}

else

{new w; w->vdata=val;

T->cindex[1]=w;};

};// done with right

Go downrecursively till youfind a NULL.

Prepare a newnode and insert asleaf.

Note that if equal,duplicate node onright is created.

Operations:bounded by heightof tree.

April 11, 2017 8 / 28

Page 9: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Insert Example

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

7.5

7.5

No control on balance. See for example, elements 7.6, 7.7,...

Balance clearly important to keep good relationship between nand height.

April 11, 2017 9 / 28

Page 10: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Locate and deleting an element

Step I: Locate x in T , as before.Step II:Delete x from T to get T ′: T ′ is structurally different anddepends on the location v ∈ T of x .Case I. v is a leaf. Easy. T ′ = T − v .

1

2

3

3

4

5

5

7

8

9

6

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

Inorder=[1 2 3 3 X 5 5 6 7 8 9]

Note that inorder(T − v) = inorder(T )− x .

April 11, 2017 10 / 28

Page 11: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Case II

Case II: v which has only once child w . Harder. Make parent p of vpoint to w .

1

2

3

3

4

5

5

7

8

9

6

46

Inorder=[1 2 3 3 4 5 5 6 X 8 9]

Inorder=[1 2 3 3 4 5 5 6 7 8 9]Inorder=[1 2 3 3 4 X 5 6 7 8 9]

April 11, 2017 11 / 28

Page 12: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Case II-continuedLet us see when v is the rightchild of the parent p. The othercase is similar.

For both cases w = pRL orw = pRR , we make w = pRto get T ′.

For w = pRL, we see thatinorderT (p) = inorder(TL) ·e(p) · inorder(Tw) · e(v),while for T ′, we now have:inorderT ′(p) = inorder(TL) ·e(p) · inorder(Tw) · e(v).

p

v

Tw

wTL

p

TL

w

Tw

p

TL

w

Tw

p

wTL

v

Tw

The case for w = pRR is similar.

Note that inorder(p) is a subsequence of inorder(T ).

April 11, 2017 12 / 28

Page 13: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Case IIIv has both children Hardest.

1

2

3

3

4

5

5

7

8

9

6

Inorder=[1 2 3 3 4 5 5 6 7 X 9]

Case I

Case II

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

Inorder=[1 2 3 X 4 5 5 6 7 8 9]

Let w = next(v) (or prev(v), either is fine).

Copy y = e(w) into v . Delete location w .

For w , either Case I or II always applies.

April 11, 2017 13 / 28

Page 14: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Imbalance

Ideal case: Complete Binary Tree. Insert will tak O(log(n)) timebut CBT condition gets disturbed.

Worst Case: Path, Insert will take O(n) time. Conditionimproves with more insertions.

Heap-tree:Insertion not local if heap-tree structure to bemaintained. What if a small number is to be inserted in acomplete binary tree?

Is there a via-media?April 11, 2017 14 / 28

Page 15: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Adelson-Velski, Landis(AVL) condition various

examplesAVL Tree: A tree is an AVL tree-structure iff for all nodes v ,|height(vL)− height(vR) ≤ 1|.

v

hL

hR

good

good

good

good

good

good

good

good

good good

good

good

good

good

good good

good

good

good

good

good

good good

good

good

bad

good

good

bad

bad

good

good

bad

bad

good

good

good

bad

April 11, 2017 15 / 28

Page 16: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

AVL condition: maths

Theorem: There is a constant 1 < α < 2 so that given any AVL treeTn of height n, the number of vertices |Tn| in Tn obeys therelationship αn ≤ |Tn| ≤ 2n.Proof: That |Tn| ≤ 2n is easy to show since the most number ofvertices a binary tree of height n can have is 2n.

We prove this by induction. Since Tn is of height n, we must eitherhave (i) one of the children, say TL is such that height(TL) = n − 1and height(TR) = n − 2, or (ii) both children have height n − 1, i.e.,height(TL) = height(TR) = n − 1.In case (i), we have

|Tn| = 1 + |TL|+ |TR |≥ αn−1 + αn−2

April 11, 2017 16 / 28

Page 17: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Continued...

So is there an α such that

|Tn| ≥ αn−1 + αn−2 ≥ αn for all n

In case (ii) we get:

|Tn| ≥ 2 · αn−1 ≥ αn for all n

The second condition is satisfied if α < 2. For the first condition,choose α = 1+

√5

2≈ 1.6. Verify that α2 = α + 1.

Proved!Thus, we have shown that 1.6n ≤ |Tn| ≤ 2n

April 11, 2017 17 / 28

Page 18: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Conclude that...If the AVL condition can be maintained, then insertion, find anddelete can be done in O(log(n)) time.

But is this condition disturbed? YES

Can it be recovered? YES Rotations.

good

good

good

good good

good

good

good

good

good

good good

good

good

bad

delete

insert insert

Can we tell for which insertions and deletions will we lose AVLconditions? Some shown. Any others?

At which vertex/node will this condition be lost?

April 11, 2017 18 / 28

Page 19: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Analysis: Insertion

h+1

insert

LL imbalanceh+2

h+1h

h

h+2

h

hh

h+1

insert

LR imbalanceh+2

h

h

h+2

h

hh+1

h

v

w

v

w

v

w

v

w

April 11, 2017 19 / 28

Page 20: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Rotation: The LL case

AB

C

h+2

h+1

h

h

w

h+1 A B C

hh

vh+3 w

h+2

xx

v

Vertex v is the first node where imbalance occurs. Vertex x isthe parent and v the left child. It does not matter whether v isthe left or the right child.

Note that the inorder listing of (T (x))L is AwBvC and isunchanged.

The height of (T (x))L remains unchanged, so that the effectdoes not percolate up.

April 11, 2017 20 / 28

Page 21: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Rotation: LR case

AB

C

h+2

h

w

A B C

h

vh+3

xx

v

h+1h

h+1h

h+3

w

Does Not Work!

What is to be done? Rotate at w followed by v .

April 11, 2017 21 / 28

Page 22: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

What to do: Double Rotate

AB

C

h+2

h

w

h+3

x

v

h+1h

A

Ch

w

x

v

h

z

B1 B2h−1

C

x

v

h

AB1

B2

w

z

AB1

w

z

B2C

v

h h h−1

h

h+1h+1

h+2

x

April 11, 2017 22 / 28

Page 23: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Deletions

a

b c

d e

2 4

3

4

5

T2

T2

T3

a

c

5

rotate left

T2 T2

T3

4

impact

above c

a

d

As opposed to insert, deletemay require re-balancing allthe way up.

Also note that if instead ofT2,T2,T3, it was T2,T3,T2,then we would require arotate right at d

April 11, 2017 23 / 28

Page 24: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

AVL Trees

Operation Time RemarksFind (O(log(n)) Binary SearchInsert (O(log(n)) Binary Search

O(constant) Add LeafO(constant) Re-balance

Delete (O(log(n)) Binary SearchO(constant) Case I,II or IIIO(log(n)) Re-balance

Many other variations after AVL: 2-3 Trees, Red-Black Trees.

April 11, 2017 24 / 28

Page 25: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Time complexity

Behind each programming step is an actual device which doesthe computing.

This usually means some mathematical operations on operandsof some fixed size.

Fetching operands and depositing them in a some fixed location.

It may also involve indirect addressing, where the actuall addressof the operand or the outcome needs a computation.

April 11, 2017 25 / 28

Page 26: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

So far ...

Data Structure D: Sets S1, S2 etc., with some inherent relations.

Operations: Which D will support.

Applications : Places where D is used.

Performance: Number of operations that are needed to performthese oprations.

Example: Queue.

Q ⊆ U , a subset of a universal set. Three operations: push,pop, isempty and an size.

Various relations. Foremost, between two consecutive momentswhen the queue is empty, first-in, first-out.

Performance depends on implementation. Circular array: peroperation, a constant ampunt of steps.

April 11, 2017 26 / 28

Page 27: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Data-structure and addressing

Intimate relationship between mechanism of access to an item,i.e., hardware to relationships in the data-structure.

LHS is the actual or real operations that take place in themachine, while RHS are the abstract operations required in thedata-structure.

Queue: Linear array beginning at 0: push is constant time whilepop will take time proportional to n, the number of elements inthe queue.

A physically circular arraymakes both operations take a constantnumber of real operations (but the constant is bigger). Theimplementation is mathematically done by the mod operation.

April 11, 2017 27 / 28

Page 28: CS213d Data Structures and Algorithms Binary Search Treescs213d/bst.pdf · AVL condition: maths Theorem: There is a constant 1

Thanks

April 11, 2017 28 / 28