algorithms and data structures i. lecture...

Algorithms and Data Structures I.

Lecture Notes

Tibor ÁsványiDepartment of Computer Science

Eötvös Loránd University, Budapest

[email protected]

September 5, 2019

Contents

1 Notations 41.1 Time complexity of algorithms . . . . . . . . . . . . . . . . . . 5

2 Elementary Data Structures and Data Types 82.1 Arrays, memory allocation and deallocation . . . . . . . . . . 82.2 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Algorithms in computing: Insertion Sort 13

4 Fast sorting algorithms based on the divide and conquer ap-proach 164.1 Merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 The eciency of merge sort . . . . . . . . . . . . . . . 194.2 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Linked Lists 245.1 One-way or singly linked lists . . . . . . . . . . . . . . . . . . 24

5.1.1 Simple one-way lists (S1L) . . . . . . . . . . . . . . . . 245.1.2 One-way lists with header node (H1L) . . . . . . . . . 255.1.3 One-way lists with trailer node . . . . . . . . . . . . . 255.1.4 Handling one-way lists . . . . . . . . . . . . . . . . . . 265.1.5 Sorting one-way lists: Insertion sort . . . . . . . . . . . 275.1.6 Cyclic one-way lists . . . . . . . . . . . . . . . . . . . . 27

5.2 Two-way or doubly linked lists . . . . . . . . . . . . . . . . . . 285.2.1 Simple two-way lists (S2L) . . . . . . . . . . . . . . . . 285.2.2 Cyclic two-way lists (C2L) . . . . . . . . . . . . . . . . 285.2.3 Example programs on C2Ls . . . . . . . . . . . . . . . 30

6 Trees, binary trees 316.1 General notions . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.3 Linked representations of binary trees . . . . . . . . . . . . . . 346.4 Binary tree traversals . . . . . . . . . . . . . . . . . . . . . . . 35

6.4.1 An application of traversals: the height of a binary tree 376.4.2 Using parent pointers . . . . . . . . . . . . . . . . . . . 38

6.5 Binary search trees . . . . . . . . . . . . . . . . . . . . . . . . 386.6 Level-continuous binary trees, and heaps . . . . . . . . . . . . 406.7 Arithmetic representation of level-continuous binary trees . . . 406.8 Heaps and priority queues . . . . . . . . . . . . . . . . . . . . 41

2

6.9 Heapsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7 Lower bounds for sorting 457.1 Comparison sorts and the decision tree model . . . . . . . . . 457.2 A lower bound for the worst case . . . . . . . . . . . . . . . . 46

8 Sorting in Linear Time 478.1 Radix sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.2 Distributing sort . . . . . . . . . . . . . . . . . . . . . . . . . 478.3 Radix-Sort on lists . . . . . . . . . . . . . . . . . . . . . . . . 488.4 Counting sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 508.5 Radix-Sort on arrays ([1] 8.3) . . . . . . . . . . . . . . . . . . 538.6 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

9 Hash Tables 559.1 Direct-address tables . . . . . . . . . . . . . . . . . . . . . . . 559.2 Hash tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569.3 Collision resolution by chaining . . . . . . . . . . . . . . . . . 569.4 Good hash functions . . . . . . . . . . . . . . . . . . . . . . . 589.5 Open addressing . . . . . . . . . . . . . . . . . . . . . . . . . 59

9.5.1 Open addressing: insertion and search, without deletion 599.5.2 Open addressing: insertion, search, and deletion . . . . 609.5.3 Linear probing . . . . . . . . . . . . . . . . . . . . . . 619.5.4 Quadratic probing . . . . . . . . . . . . . . . . . . . . 629.5.5 Double hashing . . . . . . . . . . . . . . . . . . . . . . 63

3

References

[1] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.,Introduction to Algorithms (Third Edititon), The MIT Press, 2009.

[2] Cormen, Thomas H., Algorithms Unlocked, The MIT Press, 2013.

[3] Narashima Karumanchi,

Data Structures and Algorithms Made Easy, CareerMonk Publication,2016.

[4] Neapolitan, Richard E., Foundations of algorithms (Fifth edition),Jones & Bartlett Learning, 2015. ISBN 978-1-284-04919-0 (pbk.)

[5] Weiss, Mark Allen, Data Structures and Algorithm Analysis in C++(Fourth Edition),Pearson, 2014.http://aszt.inf.elte.hu/∼asvanyi/ds/DataStructuresAndAlgorithmAnalysisInCpp_2014.pdf

[6] Wirth, N., Algorithms and Data Structures,Prentice-Hall Inc., 1976, 1985, 2004.http://aszt.inf.elte.hu/∼asvanyi/ds/AD.pdf

[7] Burch, Carl, B+ treeshttp://aszt.inf.elte.hu/∼asvanyi/ds/B+trees.zip

1 Notations

N = 0; 1; 2; 3; . . . = natural numbers.Z = . . .− 3;−2,−1; 0; 1; 2; 3; . . . = integer numbersR = real numbersP = positive real numbersP0 = nonnegative real numbers

lg n =

log2 n ha n > 00 ha n = 0

half(n) =⌊n2

⌋, ahol n ∈ N

Half(n) =⌈n2

⌉, ahol n ∈ N

We use the structogram notation in our pseudo codes. In the structograms weuse a UML-like notation with some C++/Java/Pascal avour. Main points:

1. A program is made up of declarations. Each of them is represented bya structogram, but we do not care about their order.

4

2. We can declare subprograms, classes, variables and constants.

3. The subprograms are the functions, the procedures (i.e. void func-tions), and the methods of the classes.

4. while loop is the default loop ; for loops are also used..

5. The operators 6=,≥,≤ are written by the usual mathematical notation.The assignment statements and the is equal to comparisons are writ-ten in Pascal style. For example, x := y assigns the value of y to x andx = y checks their equality.

6. Only scalar parameters can be passed by value (and it is their default).Non-scalar parameters are always passed by reference. For example,an object cannot be passed by value, just by reference.

7. An assignment statement can copy just a scalar value.

8. In the denitions of classes we use a simple UML notation. Thename of the class comes in the rst part. The data members in thesecond part, and the methods, if any, in the third part. A privatedeclaration/specication is prexed by a − sign. A public declara-tion/specication is prexed by a + sign. We use neither templatesnor inheritance (nor protected members/methods).

9. We do not use libraries.

10. We do not use exception handling.

1.1 Time complexity of algorithms

Time complexity (operational complexity) of an algorithm reects some kindof abstract time. We count the subprogram calls + the number of loopiterations during the run of the program. (This measure is approximatelyproportional to the real runtime of the program, and we can omit constantfactors here, because they are useful only if we know the programming en-vironment [for example, (the speed of) the hardware, the operation system,the compiler etc.]).

We calculate the time complexity of an algorithm typically as a functionof the size of the input data structure(s) (for example, the length of theinput array). We distinguish MT (n) (Maximum Time), AT (n) (Average orexpected Time), and mT (n) (minimum Time). Clearly MT (n) ≥ AT (n) ≥

5

mT (n). If MT (n) = mT (n) then we can speak of a general time complexityT (n) where T (n) = MT (n) = mT (n).

Denition 1.1 Given g : N→ R ;g is asymptotically positive, ithere exists an N ∈ N so that g(n) > 0 for each n ≥ N .

In this chapter we suppose that f, g, h denote asymptotically positive func-tions of type N → R in each case, because they denote time (or space)complexity functions and these satisfy this property. Usually it is not easyto give such a function exactly. We just make estimations.

When we make an asymptotic upper estimate g of f , then we say thatf is big-O g, and write f ∈ O(g). Informally f ∈ O(g) means that functionf is at most proportional to g. (f(n) ≤ d ∗ g(n) for some d > 0, if n is largeenough.)

When we make an asymptotic lower estimate g of f , then we say thatf is Omega g, and write f ∈ Ω(g). Informally f ∈ Ω(g) means that functionf is at most proportional to g. (f(n) ≥ c ∗ g(n) for some c > 0, if n is largeenough.)

When we make an asymptotic upper and lower estimate g of f , then wesay that f is Theta g, and write f ∈ Θ(g) which means that f ∈ O(g)∧f ∈Ω(g).

Denition 1.2

f ≺ g ⇐⇒ limn→∞

f(n)

g(n)= 0

f ≺ g is read as f is asymptotically less than g. It can also be writtenas f ∈ o(g) which means that

o(g) = h : N→ P0 |h ≺ g

Denition 1.3

g f ⇐⇒ f ≺ g

g f is read as g is asymptotically greater than f .

Denition 1.4 O(g) = f | there exist positive constants d and n0

. such that f(n) ≤ d ∗ g(n) for all n ≥ n0

(f ∈ O(g) can be read as f is at most proportional to g.)

6

Denition 1.5 Ω(g) = f | there exist positive constants c and n0

. such that f(n) ≥ c ∗ g(n) for all n ≥ n0

(f ∈ Ω(g) can be read as f is at least proportional to g.)

Denition 1.6

Θ(g) = O(g) ∩ Ω(g)

(f ∈ Θ(g) can be read as f is proportional to g.)

Consequence 1.7 .f ∈ O(g) ⇐⇒ ∃d, n0 > 0, and ψ : N→ R so that limn→∞

ψ(n)g(n)

= 0, and

f(n) ≤ d ∗ g(n) + ψ(n)

for each n ≥ n0.

Consequence 1.8 .f ∈ Ω(g) ⇐⇒ ∃c, n0 > 0, and ϕ : N→ R so that limn→∞

ϕ(n)g(n)

= 0 and

c ∗ g(n) + ϕ(n) ≤ f(n)

for each n ≥ n0.

Consequence 1.9 f ∈ Θ(g) ⇐⇒ f ∈ O(g) ∧ f ∈ Ω(g) .

Note 1.10 .If f ∈ O(g), we can say that g is asymptotic upper bound of f .If f ∈ Ω(g), we can say that g is asymptotic lower bound of f .If f ∈ Θ(g), we can say that f and g are asymptotically equivalent.

Theorem 1.11

limn→∞

f(n)

g(n)= 0 =⇒ f ≺ g =⇒ f ∈ O(g)

limn→∞

f(n)

g(n)= c ∈ P =⇒ f ∈ Θ(g)

limn→∞

f(n)

g(n)=∞ =⇒ f g =⇒ f ∈ Ω(g)

Consequence 1.12

k ∈ N∧a0, a1, . . . , ak ∈ R∧ak > 0 =⇒ aknk+ak−1n

k−1+. . .+a1n+a0 ∈ Θ(nk)

7

Proof.

limn→∞

aknk + ak−1n

k−1 + . . .+ a1n+ a0nk

=

limn→∞

(akn

k

nk+ak−1n

k−1

nk+ . . .+

a1n

nk+a0nk

)=

limn→∞

(ak +

ak−1n

+ . . .+a1nk−1

+a0nk

)=

limn→∞

ak + limn→∞

ak−1n

+ . . .+ limn→∞

a1nk−1

+ limn→∞

a0nk

=

ak + 0 + . . .+ 0 + 0 = ak ∈ P =⇒akn

k + ak−1nk−1 + . . .+ a1n+ a0 ∈ Θ(nk)

2

2 Elementary Data Structures and Data Types

A data structure (DS) is a way to store and organize data in order tofacilitate access and modications. No single data structure works well forall purposes, and so it is important to know the strengths and limitations ofseveral of them ([1] 1.1).

A data type (DT) is a data structure + its operations.An abstract data type (ADT) is a mathematical or informal structure

+ its operations described mathematically or informally.A representation of an ADT is an appropriate DS.An implementation of an ADT is some program code of its operations.

[1] Chapter 10; [6] 4.1 - 4.4.2; [3] 3-5

2.1 Arrays, memory allocation and deallocation

The most common data type is the array. It is a nite sequence of data. Itsdata elements can be accessed and updated directly and eciently throughindexing. Most programming languages support arrays and we can accessany element of an array in Θ(1) time.

In this book arrays must be declared in the following way.

A,Z : T[n]

In this example A and Z are two arrays of element type T and of size n.In our model, the array data structure is an array object containing its size

and elements, and it is accessed through a so called array pointer containing

8

its memory address. The operations of the array can read its size like A.M and Z.M : A.M = Z.M = n here, access (read and write) its elements through indexing as it is usual, forexample A[i] and Z[j] here.Arrays are usually indexed from 1. However, if the array identier ends withletter z or Z, the array is indexed from zero. Therefore the elements ofarray A above are 〈A[1], . . . A[n]〉 while the elements of array Z above are〈Z[0], . . . Z[n− 1]〉.

Provided that an object or variable is created by declaring it, this object orvariable is deleted automatically when the subprogram containing it nishes.Then the memory area reserved by it can be reused for other purposes.

If we want to declare array pointers, we can do it in the following way.

P,Qz : T[]

Now, given the declarations above, after the assignment statement P := Z,P and Z refer to the same array object, P [1] is identical with Z[0], and soon, P [n] is identical with Z[n− 1].

Array objects can be created (i.e. allocated) dynamically like in C++,but our array objects always contain their size, unlike in C++. For example,the statement

P := Qz := new T[m]

creates a new array object, pointers P , andQz refer to it, P.M = Qz.M = m,P [1] is identical with Qz[0], and so on, P [m] is identical with Qz[m− 1].

Note that any object (and especially array object) generated dynamicallymust be deleted (i.e. deallocated) explicitly when the object is not neededany more. Deletion is done like in C++, in order to avoid memory leaking.

deleteP or deleteQz

suces here. It does note delete the pointer but the pointed object. Havingdeleted the object the memory area reserved by it can be reused for otherpurposes.

Unfortunately we cannot tell anything about eciency of memory allo-cation and deallocation in general. Sometimes these can be performed withΘ(1) time complexity, but usually they need much more time, and their ef-ciency is often unpredictable. Therefore we avoid the overuse of them andwe apply them only when it is really necessary.

If we want to pass an array parameter to a subprogram, the actual parametermust be the identier of the array, the formal parameter has to be specied as

9

an array pointer and the parameter passing copies the address of the actualarray object into the formal parameter.

Therefore the following two procedures are equivalent and after the pre-vious assignment statement (P := Qz := new T[m]) both of the procedurecalls init(P, y) and init(Qz, y) have the same eect: they initialize all theelements of the newly created array object to y (provided that y : T). init(A : T[] ; x : T)

i := 1 to A.M

A[i] := x

init(Z : T[] ; x : T)

i := Z.M − 1 downto 0

Z[i] := x

Given an array A, subarray A[u..v] denotes the following sequence of elementsof A: 〈A[u], . . . , A[v]〉 where u and v are valid indexes in the array. If u > v,the subarray is empty (i.e. 〈〉). In this case u or v may be invalid index.

2.2 Stacks

A stack is a LIFO (Last-In First-Out) data storage. It can be imagined as avertical sequence of items similar to a tower of plates on a table. We can pusha new item at the top, and we can check or remove (i.e. pop) the topmostitem.

In the following representation the elements of the stack are stored insubarray A[1..n]. Provided that n > 0, A[n] is the top of the stack. Providedthat n = 0, the stack is empty.

T (n) ∈ Θ(1) for each method, because there is neither iteration nor sub-program invocation in their code. The time complexities of the constructorand of the destructor depend on that of the new and delete expressions.

Stack- A : T[] // T is some known type ; A.M is the max. size of the stack- n : 0..A.M // n is the actual size of the stack+ Stack(m : N) A := new T[m] ; n := 0 // create an empty stack+ push(x : T) // push x onto the top of the stack+ pop() : T // remove and return the top element of the stack+ top() : T // return the top element of the stack+ isFull() : B return n = A.M+ isEmpty() : B return n = 0+ setEmpty() n := 0 // reinitialize the stack+ ∼ Stack() delete A

10

Stack::push(x : T)

AAn < A.M

n+ +

A[n] := xStackOverow

Stack::pop():T

AAn > 0

n−−return A[n+ 1]

StackUnderow

Stack::top():T

AAn > 0

return A[n] StackUnderow

Example for a simple use of a stack: printing n input items in reversed order.We suppose that read(x) reads form the current input the next piece of dataand write(x) prints to the current output the value of x. Treverse(n) ∈ Θ(n). reverse(n : N)

v : Stack(n)

x : T

n > 0

read(x)

v.push(x)

n := n− 1

¬v.isEmpty()

write(v.pop())

2.3 Queues

A queue is a FIFO (First-In First-Out) data storage. It can be imagined asa horizontal sequence of items similar to a queue at the cashier's desk. Wecan add a new item to the end of the queue, and we can check or remove therst item.

In the following representation the elements of the queue are stored in

〈Z[k], Z[(k + 1) mod Z.M ], . . . , Z[(k + n− 1) mod Z.M ]〉

Provided that n > 0, Z[k] is the rst element of the queue where n is thelength of the queue. Provided that n = 0, the queue is empty.

11

T (n) ∈ Θ(1) for each method, because there is neither iteration nor sub-program invocation in their code. The time complexities of the constructorand of the destructor depend on that of the new and delete expressions.

Queue−Z : T[] T is some known type−n : 0..Z.M // n is the actual length of the queue−k : 0..(Z.M − 1) // k is the starting position of the queue in array Z+ Queue(m : N) Z := new T[m] ; n := 0 ; k := 0 // create an empty queue+ add(x : T) // join x to the end of the queue+ rem() : T // remove and return the rst element of the queue+ rst() : T // return the rst element of the queue+ length() : N return n+ isFull() : B return n = Z.M+ isEmpty() : B return n = 0+ ∼ Queue() delete Z + setEmpty() n := 0 // reinitialize the queue

Queue::add(x : T)

AAn < Z.M

Z[(k + n) mod Z.M ] := x

n+ +QueueOverow

Queue::rem() : T

AAn > 0

n−−i := k

k := (k + 1) mod Z.M

return Z[i]

QueueUnderow

Queue::rst() : T

AAn > 0

return Z[k] QueueUnderow

We described the methods of stacks and queues with simple codes: we ap-plied neither iterations nor recursions. Therefore the time complexity of eachmethod is Θ(1). This is a fundamental requirement for each implementationof stacks and queues.

Note that with linked list representations this constraint can be guaran-teed only if MTnew,MTdelete ∈ Θ(1).

12

3 Algorithms in computing: Insertion Sort

[1] Chapter 1-3; [4] Chapter 1, 2, 7

insertionSort(A : T[])

i := 2 to A.M

AAA[i− 1] > A[i]

x := A[i]

A[i] := A[i− 1]

j := i− 2

j > 0 ∧ A[j] > x

A[j + 1] := A[j]

j := j − 1

A[j + 1] := x

SKIP

mTIS(n) = 1 + (n− 1) = n

MTIS(n) = 1 + (n− 1) +n∑i=2

(i− 2) = n+n−2∑j=0

j = n+(n− 1) ∗ (n− 2)

2

MTIS(n) =1

2n2 − 1

2n+ 1

MTIS(n) ≈ (1/2) ∗ n2, if n is large.

Let us suppose that our computer can perform2 ∗ 109 elementary operations /second.Considering the code of insertion sort above, and counting these operationsas a function of n, we receive that mT (n) ≥ 8 ∗ n and MT (n) > 6 ∗ n2

elementary operations. Counting with mT (n) ≈ 8 ∗ n and MT (n) ≈ 6 ∗ n2,we receive the following table on running times as lower estimates:

n mT (n) in secs MT (n) in time1000 8000 4 ∗ 10−6 6 ∗ 106 0.003 sec106 8 ∗ 106 0.004 6 ∗ 1012 50 min107 8 ∗ 107 0.04 6 ∗ 1014 ≈ 3.5 days108 8 ∗ 108 0.4 6 ∗ 1016 ≈ 347 days109 8 ∗ 109 4 6 ∗ 1018 ≈ 95 years

13

5 2 7 1 4 6 8 3

2 5 7 1 4 6 8 3

2 5 7 1 4 6 8 3

1 2 5 7 4 6 8 3 (*)

1 2 4 5 7 6 8 3

1 2 4 5 6 7 8 3

1 2 4 5 6 7 8 3

1 2 3 4 5 6 7 8

(*) detailed:

1 2 5 7 4 6 8 3

x = 4

1 2 5 7 6 8 3

x = 4

1 2 5 7 6 8 3

x = 4

Figure 1: An illustration of Insertion Sort

14

This means that in the worst case insertion sort is too slow to sort one millionelement, and it becomes completely impractical, if we try to sort huge amountof data. Let us consider the average case:

ATIS(n) ≈ 1 + (n− 1) +n∑i=2

(i− 2

2

)= n+

1

2∗n−2∑j=0

j =

= n+1

2∗ (n− 1) ∗ (n− 2)

2=

1

4n2 +

1

4n+

1

2

This calculation shows, that the expected or average running time of insertionsort is roughly the half of the time needed in the worst case, so even theexpected running time of insertion sort is too long to sort one million element,and it becomes completely impractical, if we try to sort huge amount of data.The asymptotic time complexities:

mTIS(n) ∈ Θ(n)

ATIS(n),MTIS(n) ∈ Θ(n2)

Let us notice that the minimum running time is very good. There is nochance to sort elements faster than in linear time, because each piece of datamust be checked. One may say that this best case is not much gain, becausein this case the items are already sorted. The fact is, that if the input isnearly sorted, we can remain close to the best case, and insertion sort turnsout to be the best to sort such data.

Insertion sort is also stable: Stable sorting algorithms maintain the rela-tive order of records with equal keys (i.e. values). That is, a sorting algorithmis stable if whenever there are two records R and S with the same key andwith R appearing before S in the original list, R will appear before S in thesorted list. (Stability is an important property of sorting methods in someapplications.)

15

4 Fast sorting algorithms based on the divide

and conquer approach

In computer science, divide and conquer is an algorithm design paradigmbased on multi-branched recursion. A divide and conquer algorithm worksby recursively breaking down a problem into two or more sub-problems of thesame or related type, until these become simple enough to be solved directly.The solutions to the sub-problems are then combined to give a solution tothe original problem.

This divide and conquer technique is the basis of ecient algorithms manykinds of problems, for example sorting (e.g., quicksort, mergesort).

4.1 Merge sort

5 3 1 6 8 2 4

5 3 1 6 8 2 4

5 3 1 6 8 2 4

3 1 6 8 2 4

1 3 6 8 2 4

1 3 5 2 4 6 8

1 2 3 4 5 6 8

Figure 2: An illustration of merge sort.

The divide and conquer paradigm is often used to nd the optimal so-lution of a problem. Its basic idea is to decompose a given problem intotwo or more similar, but simpler, subproblems, to solve them in turn, andto compose their solutions to solve the given problem. Problems of sucient

16

simplicity are solved directly. For example, to sort a given list of n keys,split it into two lists of about n/2 keys each, sort each of them in turn, andinterleave (i.e. merge) both results appropriately to obtain the sorted versionof the given list. (See Figure 2.) This approach is known as the merge sortalgorithm.

Merge sort is stable (preserving the input order of items with equal keys)and its worst case time complexity is asymptotically optimal among compar-ison sorts. (See section 7 for details.)

mergeSort(A : T[]))

// sort A[1..A.M ]

ms(A, 1, A.M)

ms(A : T[] ; u, v : 1..A.M)

// sort A[u..v]

AAu < v

m :=⌊u+v−1

2

⌋ms(A, u,m)

ms(A,m+ 1, v)

merge(A, u,m, v)

SKIP

Using m :=⌊u+v−1

2

⌋, A[u..m] and A[(m + 1)..v] have the same length, if

the length of A[u..v] is even number; and A[u..m] is shorter by one thanA[(m+ 1)..v], if the length of A[u..v] is odd number; because

length(A[u..m]) = m− u+ 1 =

⌊u+ v − 1

2

⌋− u+ 1 =

⌊u+ v − 1

2− u+ 1

⌋=

⌊v − u+ 1

2

⌋=

⌊length(A[u..v])

2

⌋

17

merge(A : T[] ; u,m, v : 1..A.M)

// sorted merge of A[u..m] and A[(m+ 1)..v] into A[u..v]

d := m− uZ : T[d+ 1] // copy A[u..m] into Z[0..d]

i := u to m

Z[i− u] := A[i]

// sorted merge of Z[0..d] and A[(m+ 1)..v] into A[u..v]

k := u // copy into A[k]

j := 0 ; i := m+ 1 // from Z[j] or A[i]

i ≤ v ∧ j ≤ d

AAA[i] < Z[j]

A[k] := A[i]

i := i+ 1

A[k] := Z[j]

j := j + 1

k := k + 1

j ≤ d

A[k] := Z[j]

k := k + 1 ; j := j + 1

The merge itself is performed by the second and third loop of proceduremerge(A, u,m, v). Temporal array Z[0..d] is needed, because the output ofmerging must not overwrite its input. A trivial solution copies both subarraysbut it is enough to copy only the left one because k < i is an invariantproperty of the middle loop: Clearly j ≤ d = m − u, and it is an invariantof this loop that form A[(m+ 1)..v] we have copied i−m− 1 elements intoA[u..v] while from Z[0..d] we have copied j elements into it and into A[u..v]we have copied k−u elements which is the sum of the number of keys copiedform the two other subarrays. Thus

k − u = i−m− 1 + j ≤ i−m− 1 +m− u

k − u ≤ i− 1− uk ≤ i− 1

If we nish rst Z[0..d], then j = d+ 1 = m− u+ 1, and so from Z[0..d]we have copied all the d + 1 = m − u + 1 elements into A[u..v] while fromA[(m + 1)..v] we have copied i − m − 1 elements into it. We have thataltogether we have copied k − u elements into A[u..v]-be. Consequently

k − u = (m− u+ 1) + (i−m− 1)

18

k − u = i− uk = i

which means that the remaining elements of A[(m + 1)..v] are already inplace. In this case the last loop does not iterate even one and it is needed.

If in the middle loop we nish rst A[(m+1)..v], then the last loop copiesthe remainder of Z[0..d] to its nal place eciently.

The stability of merge sort is ensured in the middle loop of merge, becausein case of Z[j] = A[i], Z[j] is copied into A[k] and Z[j] came earlier in theinput.

4.1.1 The eciency of merge sort

Merge sort is one of the fastest sorting algorithms and there is not a bigdierence between its worst-case and best-case (i.e. maximal an minimal)time complexity. Using the n = A.M notation we state:

MTmergeSort(n),mTmergeSort(n) ∈ Θ(n lg n)

Outline of the proof: Consider the main procedure of mergeSort. ClearlyTmergeSort(0) = 2. We suppose that n > 0. MTmergeSort(n) = MTms(n) + 1,and mTmergeSort(n) = mTms(n) + 1 where n = v − u + 1, so it suces tocalculate the asymptotic order of procedure ms(A, u, v): that of mergeSortis the same.

Let n = v − u + 1. First we show that MTms(n) ∈ O(n lg n). Nextwe justify that mTms(n) ∈ Ω(n lg n). Clearly MTms(n) ≥ mTms(n), andMTms(n),mTms(n) ∈ Θ(n lg n) immediately follows.

MTms(n) ∈ O(n lg n): The height of the recursion is ≤ lg n + 1, becauseat each level of the recursion we divide the subarray to be sorted into nearlyequal parts. Also the ms calls form a nearly complete binary tree. (At theleaves of this tree u = v.) Clearly MTmerge(l) ∈ O(l) where l = v− u+ 1. Ateach level of the recursion of procedure ms, the dierent calls to procedurems, and to procedure merge cover array A of length n, except the lowest twolevels where only a part of the array is covered. Thus the time complexityof the computations done at each level of the recursion is O(n), and we have≤ lg n+ 2 levels of recursion, so MTms(n) ∈ O(n lg n).

mTms(n) ∈ Ω(n lg n): The height of the recursion is ≥ lg n, because ateach level of the recursion we divide the subarray to be sorted into nearlyequal parts. Clearly mTmerge(l) ∈ Ω(l) where l = v − u + 1. At each levelof the recursion of procedure ms, the dierent calls to procedure ms, andto procedure merge cover array A of length n, except the lowest two levelswhere only a part of the array is covered. Thus the time complexity of the

19

computations done at each level of the recursion is Ω(n) except lowest twolevels; and we have ≥ lg n − 1 levels of recursion, excluding these lowestlevels. Subsequently MTms(n) ∈ Ω(n lg n).

4.2 Quicksort

Quicksort is a divide and conquer algorithm. Quicksort rst divides a largearray into two smaller sub-arrays: the low elements and the high elements.Quicksort can then recursively sort the sub-arrays.

The steps are:

• Pick an element, called a pivot, from the array.

• Partitioning: reorder the array so that all elements with values lessthan the pivot come before the pivot, while all elements with valuesgreater than the pivot come after it (equal values can go either way).After this partitioning, the pivot is in its nal position. This is calledthe partition operation.

• Recursively apply the above steps to the sub-array of elements withsmaller values and separately to the sub-array of elements with greatervalues.

• The base case of the recursion is arrays of size zero or one, which arein order by denition, so they never need to be sorted.

The pivot selection and partitioning steps can be done in several dierentways; the choice of specic implementation schemes greatly aects the algo-rithm's performance. quicksort(A : T[])

QS(A, 1, A.M)

QS(A : T[] ; p, r : N)

AAp < r

q := partition(A, p, r)

QS(A, p, q − 1)

QS(A, q + 1, r)

SKIP

20

partition(A : T[] ; p, r : N) : N

i := random(p, r)

x := A[i] ; A[i] := A[r]

i := p

i < r ∧ A[i] ≤ x

i := i+ 1

AAi < r

j := i+ 1

j < r

AAA[j] < x

swap(A[i], A[j])

i := i+ 1SKIP

j := j + 1

A[r] := A[i] ; A[i] := x

A[r] := x

return i

Explanation of function partition: In order to illustrate the operationof function partition, let us introduce the following notation:

• A[k..m] ≤ x, ifor each l, k ≤ l ≤ m implies A[l] ≤ x

• A[k..m] ≥ x, ifor each l, k ≤ l ≤ m implies A[l] ≥ x

We suppose that subarray A[p..r] is cut into pieces and the pivot is the second5, i.e. the 4. element of this subarray (with index p+3).

Input:p r

A : 5 3 8 5 6 4 7 1

Preparations for the rst loop:p r

A : 5 3 8 1 6 4 7 x = 5

The rst loop searches for the rst item being greater than the pivot (x), ifthere is such element.

i=p i i rA : 5 3 8 1 6 4 7 x = 5

21

We have found an item gerater than the pivot. Variable j starts at the nextitem. From this moment p ≤ i < j ≤ r. We have cut A[p..r] into foursections:A[p..(i−1)], A[i..(j−1)], A[j..(r−1)], and A[r].We have the following properties.A[p..(i−1)] ≤ x ∧ A[i..(j−1)] ≥ x, A[j..(r−1)] is unknown and A[r] is unde-ned. Together with p ≤ i < j ≤ r this is an invariant of the second loop.(Notice that items equal to the pivot may be either in the rst, and in thesecond section of A[p..r].) Starting the run of the second loop, it turns outthat A[j] mus be exchanged with A[i] in order to add it to the rst section ofA[p..r] containing items ≤ than the pivot. Based on the invariant A[i] ≥ x, itcan go to the end of the second section (containing items ≥ than the pivot).(The items to be exchanged are printed in bold.)

p i j rA : 5 3 8 1 6 4 7 x = 5

Now we exchange A[i] and A[j]. Thus the length of the rst section of A[p..r](containing items ≤ than the pivot) has been increased by 1, while its secondsection (containing items ≥ than the pivot) has been moved by one position.So we increment variables i and j by 1, in order to keep the invariant of theloop.

Now A[j] = 6 ≥ x = 5, so we add A[j] to the second section of A[p..r],i.e. we increment j by 1.

And now A[j] = 4 < x = 5, so A[j] must be exchanged with A[i]. (Theitems to be exchanged are printed in bold.)

p i j j rA : 5 3 1 8 6 4 7 x = 5

Now we exchange A[i] and A[j]. Thus the length of the rst section of A[p..r](containing items ≤ than the pivot) has been increased by 1, while its secondsection (containing items ≥ than the pivot) has been moved by one position.So we increment variables i and j by 1, in order to keep the invariant of theloop.

Now A[j] = 7 ≥ x = 5, so we add A[j] to the second section of A[p..r],i.e. we increment j by 1.

And now j = r, therefore the third section of A[p..r] disappeared.p i j j=r

A : 5 3 1 4 6 8 7 x = 5

Now the rst two sections cover A[p..(r−1)], and the second section is notempty because of invariant i < j. Thus the pivot can be put in between the

22

items ≤ than it, and the items ≥ than it: rst we move the rst elementof the second section to the end of A[p..r], next we put the pivot into itsprevious place. (The pivot is printed in bold.)

p i j=rA : 5 3 1 4 5 8 7 6

Now the partitioning of subarray A[p..r] has been completed. We returnthe position of the pivot (i), in order to inform procedure Quicksort(A, p, r),which subarrays of A[p..r] must be sorted recursively.

In the other case of procedure partition the pivot is a maximum of A[p..r].This case is trivial. Let the reader consider it.

The time complexity of function partition is linear, because the two loopstogether perform r−p−1 or r−p iterations.

mTquicksort(n), ATquicksort(n) ∈ Θ(n lg n)MTquicksort(n) ∈ Θ(n2)

It is known that on short arrays insertion sort is more ecient than thefast sorting methods (merge sort, heap sort, quicksort). Thus procedurequicksort(A, p, r) can be optimized by switching to insertion sort on shortarrays. Because during partitioning we create a lot of short subarrays, thespeed-up can be signicant. QS(A : T[] ; p, r : N)

AAp+ c < r

q := partition(A, p, r)

QS(A, p, q − 1)

QS(A, q + 1, r)

insertionSort(A, p, r)

c ∈ N is a constant. Its optimal value depends on many factors, but usuallyit is between 20 and 40.

Exercise 4.1 How to speed-up merge sort in a similar way?

Considering its expected running time, quicksort is one of the most e-cient sorting methods. However, it slows down, if (for example) by chancefunction partition always selects the maximal element of the actual subarray.(Its time complexity is Θ(n2) in this case.) The probability of such cases islow. However, if we want to avoid such cases, we can pay attention to thedepth of recursion, and switch to heap sort or merge sort when recursionbecomes too deep (for example, deeper than 2 ∗ lg n).

23

5 Linked Lists

One-way lists can represent nite sequences of data. They consist of zero ormore elements (i.e. nodes) where each element of the list contains some dataand linking information.

5.1 One-way or singly linked lists

In singly linked lists each element contains a key, and a next pointer referringto the next element of the list. A pointer referring to the front of the listidenties the list. The next pointer of the last element typically contains aso called NULL (i.e. ) pointer1 which is the address of no object. (Withsome abstraction, the nonexisting object with address can be called NOobject.) In this book E1 is the element type of one-way lists.

E1+key : T. . . // satellite data may come here+next : E1*+E1() next :=

5.1.1 Simple one-way lists (S1L)

An S1L is identied by a pointer referring to its rst element, or this pointeris , if the list is empty.

L1 ==

L1 9 16 4 1 /

L1 25 9 16 4 1 /

L1 25 9 16 4 /

Figure 3: Pointer L1 identies simple one-way lists at dierent stages of animaginary program. In the rst line the S1L is empty.

1except in cyclic lists

24

S1L_length(L : E1*) : N

n := 0

p := L

p 6= n := n+ 1

p := p→ next

return n

If n is the length of list L,then

the loop iterates n times,so

TS1L_length(n) = 1 + n,therefore

TS1L_length(n) ∈ Θ(n).

5.1.2 One-way lists with header node (H1L)

The header node or simply header of a list is an extra zeroth element of thelist with undened key. A list with header cannot be , it always containsa header. The pointer identifying the list always refers to its header.

L2 /

L2 4 8 2 /

L2 17 4 8 2 /

L2 17 8 2 /

Figure 4: Pointer L2 identies one-way lists with heather at dierent stagesof an imaginary program. In the rst line the H1L is empty.

H1L_length(H : E1*) : N

return S1L_length(H → next)

TH1L_length(n) ∈ Θ(n)where n is the length of H1L H.

5.1.3 One-way lists with trailer node

Some one-way lists contain a trailer node instead of header. A trailer nodeis always at the end of such lists. The data members (key and next) of thetrailer node are typically undened, and we have two pointers identifying thelist: One of them refers to its rst element, and the other to its trailer node.If the list is empty, both identier pointers refer to its trailer node.

25

Such a list is ideal for representing a queue. Its method add(x : T) copiesx into the key of the trailer node, and joins a new trailer node to the end ofthe list.

5.1.4 Handling one-way lists

follow(p, q : E1*)

q → next := p→ next

p→ next := q

out_next(p : E1*) : E1*

q := p→ next

p→ next := q → next

q → next := return q

Tfollow, Tout_next ∈ Θ(1)

cut(L : E1* ; n : N) : E1*

p := L

n > 1

n := n− 1

p := p→ next

q := p→ next

p→ next := return q

H1L_read() : E1*

H := v := new E1

read(x)

v := v → next := new E1

v → key := x

// satellite data may be read here

return H

Tcut(n) ∈ Θ(n)

TH1L_read(n) ∈ Θ(n)

where n is the length of the list.

26

5.1.5 Sorting one-way lists: Insertion sort H1L_insertionSort(H : E1*)

s := H → next

AAs 6=

u := s→ next

u 6=

AAs→ key ≤ u→ key

s := u

s→ next := u→ next

p := H ; q := H → next

q → key ≤ u→ key

p := q ; q := q → next

u→ next := q ; p→ next := u

u := s→ next

SKIP

mTIS(n) ∈ Θ(n)


where n is the length of H1L H. Clearly procedure insertionSort(H : E1*)is stable.

5.1.6 Cyclic one-way lists

The last element of the list does not contain in its next eld, but thispointer points back to the beginning of the list.

If a cyclic one-way list does not contain header node, and it is not empty,the next eld of its last element points back to its rst element. If it is empty,it is represented by the pointer. A pointer identifying a nonempty, cyclicone-way list typically refers to its last element.

If a cyclic one-way list has a header node, the next eld of its last elementpoints back to its header node. If it is empty,the next eld of its header nodepoints to itself. Notice that the header of a cyclic list is also the trailer nodeof that list.

Such lists are also good choices for representing queues. Given a queuerepresented by a cyclic list with header the method add(x : T) copies x intothe key of the trailer/header node, and inserts a new trailer/header nodeinto the list.

27

5.2 Two-way or doubly linked lists

In a two-way list each element contains a key, and two pointers: a next anda prev pointer referring to the next and previous elements of the list. Apointer referring to the front of the list identies the list.

5.2.1 Simple two-way lists (S2L)

An S2L is identied by a pointer referring to its rst element, or this pointeris , if the list is empty.

The prev pointer of the rst element, and the next pointer of the lastelement contain the pointer.

L1 =

L1 / 9 16

keyprev next

4 1 /

L1 / 25 9 16 4 1 /

L1 / 25 9 16 1

Figure 5: Pointer L1 identies simple two-way lists (S2L) at dierent stagesof an imaginary program. In the rst line list L1 is initialized to empty.

Handling of S2Ls is unconvenient because all the modications of an S2Lmust be done dierently at the front of a list, at the end of the list, and inbetween. Consequently we prefer cyclic two-way lists (C2L) in general. (Inhash tables, however S2Ls are usually preferred to C2Ls.)

5.2.2 Cyclic two-way lists (C2L)

A cyclic two-way list (C2L) contains a header node (i.e. header) by default,and its is identied by a pointer referring to its header. The next eld of itslast element points back to its header which is considered a zeroth element.The prev pointer of its rst element points back to its header, and the prevpointer of its header points back to its last element. If a C2L is empty, bothof the prev, and next elds of its header point to itself. Notice that theheader of a cyclic list is also the trailer node of the list.

The element type of C2Ls follows.

28

L2

L2 9 16 4 1

L2 25 9 16 4 1

L2 25 9 16 4

Figure 6: Pointer L2 identies cyclic two-way lists (C2L) at dierent stagesof an imaginary program. In the rst line list L2 is empty.

E2+prev, next : E2* // refer to the previous and next neighbour or be this+key : T+ E2() prev := next := this

Some basic operations on C2Ls follow. Notice that these are simpler thanthe appropriate operations of S2Ls. precede(q, r : E2*)

// insert (∗q) before (∗r)p := r → prev

q → prev := p ; q → next := r

p→ next := r → prev := q

follow(p, q : E2*)

// after (∗p) insert (∗q)r := p→ next

q → prev := p ; q → next := r

p→ next := r → prev := q

out(q : E2*)

// remove (∗q)p := q → prev ; r := q → next

p→ next := r ; r → prev := p

q → prev := q → next := this

29

5.2.3 Example programs on C2Ls build(&H : E2*)

H := new E2

read(x)

p := new E2

p→ key := x

precede(p,H)

insertionSort(H : E2*)

s := H → next ; u := s→ next

u 6= H

AAs→ key ≤ u→ key

s := u

out(u)

p := s→ prev

p 6= H ∧ p→ key > u→ key

p := p→ prev

follow(p, u)

u := s→ next

Tbuild(n) ∈ Θ(n)

mTIS(n) ∈ Θ(n)


where n is the length of C2L H. Clearly procedure insertionSort(H : E2*)is stable.

30

6 Trees, binary trees

Up till now, we worked with arrays and linked lists. They have a commonproperty: There is a rst item in the data structure and any element hasexactly one other item next to it except of the last element which has nosuccessor, according to the following scheme: O O O O O .

Therefore the arrays and linked lists are called linear data structures.(Although in the case of cyclic lists the linear data structure is circular.)Thus the arrays and linked lists can be considered representations of lineargraphs.

6.1 General notions

At an abstract level, trees are also special nite graphs. They consist ofnodes (i.e. vertices) and edges (i.e. arcs). If the tree is empty, it containsneither node nor edge. The empty binary is denoted by . If it is nonempty,it always contains a root node.2 Each node of the graph has zero or moreimmediate successor nodes which are called its children and it is called theirparent. There is a directed edge going from the parent to each of its children.If a node has no child, it is called a leaf. The root has no parent. Each othernode of the tree has exactly one parent. The non-leaf nodes are also calledinternal nodes.

Notice that a linear data structure can be considered a tree where eachinternal node has a single child. Such trees will be called linear trees.

The descendants of a node are its children, and the descendants of itschildren. The ancestors of a node is its parent, and the ancestors of itsparent. The root (node) has no ancestor, but each other node of the tree isdescendant of it, thus the root is ancestor of each other node in the tree. Thusthe root of a tree identies the whole tree. The leaves have no descendant.

Traditionally a tree is drawn in reversed manner: The topmost node isthe root, and the edges go downwards (so the arrows at the ends of the edgesare often omitted). See gure 7.Given a tree, it is subtree of itself. If the tree is not empty, its other subtreesare the subtrees of the trees rooted by its children (recursively).

The size of a tree is the number of nodes in the tree. The size of tree t isdenoted by n(t) or |t| (n(t) = |t|). The number of internal nodes of t is i(t).The number of leaves of t is l(t). Clearly n(t) = i(t) + l(t) for a tree. Forexample, n() = 0, and considering gure 7, n(t1) = i(t1)+ l(t1) = 1+2 = 3,

2In this chapter the trees are always rooted, directed trees. We do not consider free trees

here which are undirected connected graphs without cycles.

31

t1 t2 t3 t4

Figure 7: Simple binary trees. The nodes of the trees are represented bycircles. If the structure of a subtree is not important or unknown, we denoteit with a triangle.

n(t2) = i(t2) + l(t2) = 1 + 1 = 2, n(t3) = i(t3) + l(t3) = 0 + 1 = 1, andn(t4) = 2 + n(t4→left) where t4→left is the unknown left subtree of t4.

We can speak of the levels of a nonempty tree. The root is at level zero(which is the topmost level), its children are at level one, its grandchildrenat level two and so on. Given a node at level i, its children are at level i+ 1(always in downwards direction).

The height of a nonempty tree is equal to the level of the leaves at itslowest level. The height of the empty tree is −1. The height of tree tis denoted by h(t). For example, h() = −1, and considering gure 7,h(t1) = 1, h(t2) = 1, h(t3) = 0, and h(t4) = 1 + max(h(t4→left), 0) wheret4→left is the unknown left subtree of t4.

All the hierarchical structures can be modeled by trees, for example thedirectory hierarchy of a computer. Typically, each node of a tree is labeledby some key, and maybe with other data.

Sources: [1] Chapter 10, 6, 12; [6] 4.4; [3] 6-7

6.2 Binary trees

Binary trees are useful, for example for representing sets and multisets (i.e.bags) like dictionaries and priority queues.

A binary tree is a tree where each internal (i.e. non-leaf) node has at mosttwo children. If a node has two children, they are the left child, and theright child of their parent. If a node has exactly one child, then this child isthe left or right child of its parent. This distinction is important.

If t is a nonempty binary tree (i.e. t 6= ), then ∗t is the root node ofthe tree, t → key is the key labeling ∗t, t → left is the left subtree of

32

t, and t → right is the right subtree of t. If ∗t does not have left child,t→ left = , i.e. the left subtree of t is empty. Similarly, if ∗t does not haveright child, t→ right = , i.e. the right subtree of t is empty.

If ∗p is a node of a binary tree, p is the (sub)tree rooted by ∗p, thusp → key is its key, p → left is its left subtree, and p → right is its rightsubtree. If ∗p has left child, it is ∗p → left. If ∗p has right child, it is∗p → right. If ∗p has parent, it is ∗p → parent. The tree rooted by theparent is p→ parent. (Notice that the inx operator→ binds stronger thanthe prex operator ∗. For example, ∗p → left = ∗(p → left).) If ∗p doesnot have parent, p→ parent = .

If p = , all the expressions ∗p, p → key, p → left, p → right, p →parent etc. are erroneous.

Properties 6.1

h() = −1

t 6= binary tree =⇒ h(t) = 1 +max(h(t→ left), h(t→ right))

A strictly binary tree is a binary tree where each internal node has two chil-dren. The following property can be proved easily by induction on i(t).

Property 6.2 Given a t 6= strictly binary tree,l(t) = i(t) + 1.Thus n(t) = 2i(t) + 1 ∧ n(t) = 2l(t)− 1

A complete binary tree is a strictly binary tree where all the leaves are atthe same level. It follows that in a complete nonempty binary tree of heighth we have 20 nodes at level 0, 21 nodes at level 1, and 2i nodes at level i(0 ≤ i ≤ h). Therefore we get the following property.

Property 6.3 Given a t 6= complete binary tree (n = n(t), h = h(t))n = 20 + 21 + . . .+ 2h = 2h+1 − 1

A nonempty nearly complete binary tree is a binary tree which becomes com-plete if we delete its lowest level. The is also nearly complete binary tree.(An equivalent definition: A nearly complete binary tree is a binary treewhich can be received form a complete binary tree by deleting zero or morenodes from its lowest level.) Notice that the complete binary trees are alsonearly complete, according to this denition.

Because a nonempty nearly complete binary tree is complete, possiblyexcept at its lowest level h, it has 2h − 1 nodes at its rst h − 1 levels, atleast 1 node and at most 2h nodes at its lowest level. Thus n ≥ 2h− 1 + 1 ∧n ≤ 2h − 1 + 2h where n is the size of the tree.

33

Properties 6.4 Given a t 6= nearly complete binary tree(n = n(t), h = h(t)). Thenn ∈ 2h..(2h+1−1). Thush = blg nc.

Now we consider the relations between the size n and height h of anonempty binary tree. These will be important at binary search trees whichare good representations of data sets stored in the RAM.

Clearly, a nonempty binary tree of height h has the most number of nodes,if it is complete. Therefore from property 6.3 we receive that for the size nof any nonempty binary tree n < 2h+1. Thus lg n < lg(2h+1) = h + 1,3 soblg nc < h+ 1. Finally blg nc ≤ h.

On the other hand, a nonempty binary tree of height h has the leastnumber of nodes, if it contains just one node at each level (i.e. it is linear),and so for the size n of any nonempty binary tree n ≥ h+1. Thus h ≤ n−1.

Property 6.5 Given a t 6= binary tree (n = n(t), h = h(t))blg nc ≤ h ≤ n− 1where the h = blg nc if the tree is nearly complete, andh = n− 1 i the tree is linear (see page 31).

Notice that there are binary trees with h = blg nc, although they are notnearly complete.

6.3 Linked representations of binary trees

The empty tree is represented by a null pointer, i.e. . A nonempty binarytree is identied by a pointer referring to a Node which represents the rootof the tree:

Node+ key : T // T is some known type+ left, right : Node*+ Node() left := right := // generate a tree of a single node.+ Node(x : T) left := right := ; key := x

Sometimes it is useful, if we also have a parent pointer in the nodes of thetree. We can see a binary tree with parent pointers on gure 8.

3Remember the denition of lg n given in section 1: lg n = log2(n) if n > 0, andlg 0 = 0.

34

/

/ /

/ / / /

/

/

/ /

/

/ /

Figure 8: Binary tree with parent pointers. (We omitted the keys of thenodes here. The null pointers are represented by / signs in this gure.)

Node3+ key : T // T is some known type+ left, right, parent : Node3*+ Node3(p:Node3*) left := right := ; parent := p + Node3(x : T, p:Node3*) left := right := ; parent := p ; key := x

6.4 Binary tree traversals

preorder(t : Node*)

AAt 6=

process(t)

preorder(t→ left)

preorder(t→ right)

SKIP

inorder(t : Node*)

AAt 6=

inorder(t→ left)

process(t)

inorder(t→ right)

SKIP

35

postorder(t : Node*)

AAt 6=

postorder(t→ left)

postorder(t→ right)

process(t)

SKIP

levelOrder(t : Node*)

AAt 6=

Q : Queue ; Q.add(t)

¬Q.isEmpty()

s := Q.rem()

process(s)

AAs→ left 6=

Q.add(s→ left) SKIP

AAs→ right 6=

Q.add(s→ right) SKIP

SKIP

Tpreorder(n), Tinorder(n), Tpostorder(n), TlevelOrder(n) ∈ Θ(n) where n = n(t) andthe time complexity of process(t) is Θ(1). For example, process(t) can printt→ key: process(t)

cout << t→ key << ` `

t

1

2

3 4

5

6 7

t

4

2

1 3

6

5 7

t

7

3

1 2

6

4 5

t

1

2

4 5

3

6 7

Figure 9: left upper part: preorder, right upper part: inorder, left lower part:postorder, right lower part: level order traversal of binary tree t.

36

We can slightly reduce the running times of preoder and inorder traversals,if we eliminate the tail recursions. (It does not aect the time complexities.) preorder(t : Node*)

t 6= process(t)

preorder(t→ left)

t := t→ right

inorder(t : Node*)

t 6= inorder(t→ left)

process(t)

t := t→ right

6.4.1 An application of traversals: the height of a binary tree

Postorder: h(t : Node*) : Z

AAt 6=

return 1+max(h(t→ left),h(t→ right)) return −1

Preorder:

h(t : Node*) : Z

max := level := −1

preorder_h(t, level,max)

return max

preorder_h(t : Node* ; level,&max : Z)

t 6= level := level + 1

AA level > max

max := level SKIP

preorder_h(t→ left, level,max)

t := t→ right

37

6.4.2 Using parent pointers inorder_next(p : Node3*) : Node3*

q := p→ right

AAq 6=

q → left 6=

q := q → left

q := p→ parent

q 6= ∧ q → left 6= p

p := q ; q := q → parent

return q

MT (h) ∈ O(h) where h = h(t)

6.5 Binary search trees

Binary search trees (BST) are a particular type of containers: data struc-tures that store "items" (such as numbers, names etc.) in memory. Theyallow fast lookup, addition and removal of items, and can be used to im-plement either dynamic sets of items, or lookup tables that allow nding anitem by its key (e.g., nding the phone number of a person by name).

Binary search trees keep their keys in sorted order, so that lookup andother operations can use the principle of binary search: when looking for akey in a tree (or a place to insert a new key), they traverse the tree fromroot to leaf, making comparisons to keys stored in the nodes of the tree anddeciding, on the basis of the comparison, to continue searching in the leftor right subtrees. On average, this means that each comparison allows theoperations to skip about half of the tree, so that each lookup, insertion ordeletion takes time proportional to the logarithm of the number of itemsstored in the tree. This is much better than the linear time required tond items by key in an (unsorted) array, but slower than the correspondingoperations on hash tables.

(The source of the text above is Wikipedia.)

A binary search tree (BST) is a binary tree, whose nodes each store a key(and optionally, some associated value). The tree additionally satises thebinary search tree property, which states that the key in each node must begreater than any key stored in the left sub-tree, and less than any key storedin the right sub-tree. (Notice that there is no duplicated key in a BST.)

38

inorderPrint(t : Node*)

AAt 6=

inorderPrint(t→ left)

write(t→ key)

inorderPrint(t→ right)

SKIP

Property 6.6 Procedure inorderPrint(t : Node*) prints the keys of binarytree t in strictly increasing order, i binary tree t is a search tree.

search(t : Node* ; k : T) : Node*

t 6= ∧ t→ key 6= k

AAk < t→ key

t := t→ left t := t→ right

return t

insert(&t : Node* ; k : T)

AAt =

t :=new Node(k)

AAk < t→ key

insert(t→ left, k)AA

k > t→ key

insert(t→ right, k)AAk = t→ key

SKIP

min(t : Node*) : Node*

t→ left 6= t := t→ left

return t

remMin(&t,&minp : Node*)

AAt→ left =

minp := t

t := minp→ right

minp→ right := remMin(t→ left,minp)

del(&t : Node* ; k : T)

AAt 6=

AAk < t→ key

del(t→ left, k)AAk > t→ key

del(t→ right, k)AA

k = t→ key

delRoot(t)SKIP

39

delRoot(&t : Node*)

AAt→ left =

p := t

t := p→ right

delete p

AAt→ right =

p := t

t := p→ left

delete p

AAt→ left 6= ∧ t→ right 6=

remMin(t→ right, p)

p→ left := t→ left ; p→ right := t→ right

delete t ; t := p

MTsearch(h),MTinsert(h),MTmin(h) ∈ Θ(h)MTremMin(h),MTdel(h),MTdelRoot(h) ∈ Θ(h)where h = h(t).

6.6 Level-continuous binary trees, and heaps

A binary tree is level-continuous, i all levels of the tree, except possibly thelast (deepest) one are completely lled (i.e. it is nearly complete), and, if thelast level of the tree is not complete, the nodes of that level are lled fromleft to right.

A binary maximum heap is a level-continuous binary tree where the keystored in each node is greater than or equal to (≥) the keys in the node'schildren.

A binary minimum heap is a level-continuous binary tree where the keystored in each node is less than or equal to (≤) the keys in the node's children.

In this lecture notes a heap is a binary maximum heap by default.

Priority queues are usually represented by heaps in arithmetic representation.

6.7 Arithmetic representation of level-continuous bi-nary trees

A level-continuous binary tree of size n is typically represented by the rstn elements of an array: We put the nodes of the tree into the array in levelorder.

Let us suppose that we use the rst n elements of an array indexed fromone, for example A : T[m], i.e. we use the subarray A[1..n]. Provided thatn > 0, the root node of the tree is A[1]. Node A[i] is internal node, i 2i ≤ n.Then the left child of the internal node A[i] exists, and it is A[2i]. The rightchild of node A[i] exists, i 2i < n. Then the right child of A[i] is A[2i+ 1].Node A[j] is not the root node of the tree, i j > 1. Then its parent isA[b j

2c].

40

Let us suppose that we use the rst n elements of an array indexed fromzero, for example Z : T[m], i.e. we use the subarray Z[0..(n − 1)]. Thenthe root node of a nonempty tree is Z[0]. Node Z[i] is internal node, i2i + 1 < n. Then the left child of the internal node Z[i] exists, and it isZ[2i + 1]. The right child of node Z[i] exists, i 2i + 2 < n. Then the rightchild of Z[i] is Z[2i+ 2]. Node Z[j] is not the root node of the tree, i j > 0.Then its parent is Z[b j−1

2c].

6.8 Heaps and priority queues

A priority queue is a bag (i.e. multiset). We can add a new item to it, and wecan check or remove a maximal item of it.4 Often there is a priority function:T → some number type, where T is the element type of the priority queue,and the items are compared according to their priorities. For example, theprocesses of a computer are often scheduled according to their priorities.

Our representation is similar to our representation of the Stack type,and some operations are also the same, except of the names. The actualelements of the priority queue are in the subarray A[1..n] containing a (binarymaximum) heap.

PrQueue- A : T[] // T is some known type- n : 0..A.M // n is the actual length of the priority queue+ PrQueue(m : N) A := new T[m]; n := 0 // create an empty priority queue+ add(x : T) // insert x into the priority queue+ remMax():T // remove and return the maximal element of the priority queue+ max():T // return the maximal element of the priority queue+ isFull() : B return n = A.M+ isEmpty() : B return n = 0+ ∼ PrQueue() delete A + setEmpty() n := 0 // reinitialize the priority queue

Provided that subarray A[1..n] is the arithmetic representation of a heap,MTadd(n) ∈ Θ(lg n), mTadd(n) ∈ Θ(1), MTremMax(n) ∈ Θ(lg n),mTremMax(n) ∈ Θ(1), Tmax(n) ∈ Θ(1).

MTadd(n) ∈ Θ(lg n), andMTremMax(n) ∈ Θ(lg n), because the main loopsof subprograms add" and sink" below iterates maximum h times if h is theheight of the heap, and h = blg nc.

4There are also min priority queues where we can check or remove a minimal item.

41

PrQueue::add(x : T)

AAn < A.M

j := n := n+ 1

A[j] := x ; i := b j2c

i > 0 ∧ A[i] < A[j]

swap(A[i], A[j])

j := i ; i := b i2c

PrQueueOverow

PrQueue::max() : T

AAn > 0

return A[1] PrQueueUnderow

PrQueue::remMax() : T

AAn > 0

max := A[1] ; A[1] := A[n]

n := n− 1

sink(A, 1, n)

return max

PrQueueUnderow

sink(A : T[] ; k, n : N)

i := k ; j := 2k ; b := true

j ≤ n ∧ b// A[j] is the left child

AAj < n ∧ A[j + 1] > A[j]

j := j + 1 SKIP

// A[j] is the greater child

AAA[i] < A[j]

swap(A[i], A[j])

i := j ; j := 2jb := false

42

6.9 Heapsort

MTheapsort(n) ∈ O(n lg n) because MTbuildMaxHeap(n) ∈ O(n lg n) and thetime complexity of the main loop of heapSort is also O(n lg n) where n =A.M . heapSort(A : T[])

buildMaxHeap(A)

m := A.M

m > 1

swap(A[1], A[m]) ; m := m− 1

sink(A, 1,m)

While building the heap we consider the level order of the tree, and we startfrom the last internal node of it. We go back in level order and sink the rootnode of each subtree making a heap out of it.

The time complexity of each sinking is O(lg n) because h ≤ O(lg n) foreach subtree. Thus MTbuildMaxHeap(n) ∈ O(n lg n) and the time complexityof the main loop of heapSort is also O(n lg n) where n = A.M .

buildMaxHeap(A : T[])

k := bA.M2c downto 1

sink(A, k,A.M)

43

t

18

23

42

53 60

14

19

51

35 97

t

18

23

42

53 60

19

14

51

35 97

t

18

23

60

53 42

19

14

51

35 97

t

18

23

60

53 42

19

14

97

35 51

t

18

60

23

53 42

19

14

97

35 51

t

18

60

53

23 42

19

14

97

35 51

t

97

60

53

23 42

19

14

18

35 51

t

97

60

53

23 42

19

14

51

35 18

Figure 10: From array 〈18, 23, 51, 42, 14, 35, 97, 53, 61, 19〉 we build a maxi-mum heap. Notice that we work on the array from the beginning to the end.The binary trees show the logical structure of the array. Finally we receivearray 〈97, 60, 51, 53, 19, 35, 18, 23, 42, 14〉.

7 Lower bounds for sorting

Theorem 7.1 For any sorting algorithm mT (n) ∈ Ω(n).

Proof. Clearly we have to check all the n items and only a limited numberof items is checked without starting a new subprogram call or loop iteration.Let this limit be k. Then mT (n) ≥ n−k

k= 1

kn− 1 =⇒ mT (n) ∈ Ω(n). 2

7.1 Comparison sorts and the decision tree model

Denition 7.2 A sorting algorithm is comparison sort, i it gains informa-tion about the sorted order of the input items only by comparing them. Thatis, given two items ai and aj, in order to compare them it performs one ofthe tests ai < aj, ai ≤ aj, ai = aj, ai 6= aj, ai ≥ aj, or ai > aj. We maynot inspect the values of the elements or gain order information about themin any other way. [1]

The sorting algorithms we have studied before, that is insertion sort, heapsort, merge sort, and quick sort are comparison sorts.

In this section, we assume without loss of generality that all the inputelements are distinct5. Given this assumption, comparisons of the form ai =aj and ai 6= aj are useless, so we can assume that no comparisons of this formare made6. We also note that the comparisons ai < aj, ai ≤ aj, ai ≥ aj, andai > aj are all equivalent in that they yield identical information about therelative order of ai and aj. We therefore assume that all comparisons havethe form ai ≤ aj. [1]

We can view comparison sorts abstractly in terms of decision trees. A decisiontree is a strictly binary tree that represents the comparisons between elementsthat are performed by a particular sorting algorithm operating on an input ofa given size. Control, data movement, and all other aspects of the algorithmare ignored. [1]

Let us suppose that 〈a1, a2, . . . an〉 is the input sequence to be sorted. In adecision tree, each internal node is labeled by ai ≤ aj for some elements of theinput. We also annotate each leaf by a permutation of 〈a1, a2, . . . an〉. Theexecution of the sorting algorithm corresponds to tracing a simple path from

5In this way we restrict the set of possible inputs and we are going to give a lower boundfor the worst case of comparison sorts. Thus, if we give a lower bound for the maximumnumber of key comparisons (MC(n)) and for maximum time complexity (MT (n)) on thisrestricted set of input sequences, it is also a lower bound for them on the whole set of inputsequences, because MC(n) and MT (n) are surely ≥ on a larger set than on a smaller set.

6Anyway, if such comparisons are made, neither MC(n) nor MT (n) are decreased.

45

the root of the decision tree down to a leaf. Each internal node indicatesa comparison ai ≤ aj. The left subtree then dictates subsequent compar-isons once we know that ai ≤ aj, and the right subtree dictates subsequentcomparisons knowing that ai > aj. When we come to a leaf, the sorting al-gorithm has established the appropriate ordering of 〈a1, a2, . . . an〉. Becauseany correct sorting algorithm must be able to produce each permutation ofits input, each of the n! permutations on n elements must appear as one ofthe leaves of the decision tree for a comparison sort to be correct. Thus, weshall consider only decision trees in which each permutation appears as a leafof the tree. [1]

7.2 A lower bound for the worst case

Theorem 7.3 Any comparison sort algorithm requires MC(n) ∈ Ω(n lg n)comparisons in the worst case.

Proof. From the preceding discussion, it suces to determine the height h =MC(n) of a decision tree in which each permutation appears as a reachableleaf. Consider a decision tree of height h with l leaves corresponding to acomparison sort on n elements. Because each of the n! permutations of theinput appears as some leaf, we have n! ≤ l. Since a binary tree of height hhas no more than 2h leaves, we have n! ≤ l ≤ 2h. [1]

Consequently

MC(n) = h ≥ lg n! =n∑i=1

lg i ≥n∑

i=dn2 elg i ≥

n∑i=dn2 e

lg⌈n

2

⌉≥⌈n

2

⌉∗ lg

⌈n2

⌉≥

≥ n

2∗ lg

n

2=n

2∗ (lg n− lg 2) =

n

2∗ (lg n− 1) =

n

2lg n− n

2∈ Ω(n lg n)

2

Theorem 7.4 For any comparison sort algorithm MT (n) ∈ Ω(n lg n).

Proof. Only a limited number of key comparisons are performed withoutstarting a new subprogram call or loop iteration. Let this limit be k. ThenMT (n) ≥ MC(n)−k

k= 1

kMC(n)− 1 =⇒ MT (n) ∈ Ω(MC(n)). Together with

theorem 7.3 and transitivity we receive this theorem. 2

Le us notice that heap sort and merge sort are asymptotically optimal in thesense that their O(n lg n) upper bound meets the Ω(n lg n) lower bound fromTheorem 7.4. This proves that MT (n) ∈ Θ(n lg n) for both of them.

46

8 Sorting in Linear Time

8.1 Radix sort

The abstract code for radix sort is straightforward. We assume that eachelement of abstract list A is a natural number of d digits, where digit 1 is thelowest-order digit and digit d is the highest-order digit. radix_sort(A : dDigitNumber〈〉 ; d : N)

i := 1 to d

use a stable sort to sort list A on digit i

Radix sort solves the problem of sorting counterintuitively by sortingon the least signicant (i.e. rst) digit in its rst pass.

In the second pass it takes the result of the rst pass and applies a stablesort to sort it on the second digit. It is important to apply stable sort so thatthose numbers with equal second digit remain sorted according to their rstdigit. Consequently, after the second pass the numbers are already sorted onthe rst two digits (which are the two lowest order digits).

In the third pass Radix sort takes the result of the second pass and appliesa stable sort to sort it on the third digit. It is important to apply stable sortso that those numbers with equal third digit remain sorted according to theirrst two digits. Consequently, after the third pass the numbers are alreadysorted on the rst three digits (which are the three lowest order digits).

This process continues until the items have been sorted on all d digits.Remarkably, at that point the items are fully sorted on the d-digit number.Thus, only d passes through the numbers are required to sort. [1]

The sorting method used in each pass can be any stable sort, but it shouldrun in linear time in order to maintain eciency.

Distributing sort works well on linked lists, and counting sort on arrays.Both of them is stable and works in linear time.

8.2 Distributing sort

Distributing sort is ecient on linked lists, and a version of radix sort canbe built on it.

Remember that stable sorting algorithms maintain the relative order ofrecords with equal keys (i.e. values).

Distributing sort is ideal auxiliary method of radix sort because of itsstability and linear time complexity.

47

Here comes a bit more general study than needed for radix sort. Whendistributing sort is used for radix sort, its key function ϕ must select theappropriate digit.

The sorting problem: Given abstract list L of length n with element typeT, r ∈ O(n) positive integer,ϕ : T → 0..(r−1) key selection function.

Let us sort list L with stable sort, and with linear time complexity.

distributing_sort(L : T〈〉 ; r : N ; ϕ : T → 0..(r−1))

Z : T〈〉[r] // array of lists, i.e. bins

k := 0 to r−1

Let Z[k] be empty list // init the array of bins

L is not empty

Remove the rst element x of list L

Insert x at the end of Z[ϕ(x)] // stable distribution

k := r−1 downto 0

L := Z[k] + L // append Z[k] before L

Eciency of distributing sort: The rst and the last loop iterates rtimes, and the middle loop n times. Provided that insertion and concatena-tion can be performed in Θ(1) time, the time complexity of distributing sortis consequently Θ(n+ r) = Θ(n) because of the natural condition r ∈ O(n).

8.3 Radix-Sort on lists

The next example shows how radix sort operates on an abstract list of seven3-digit numbers with base (i.e. radix) 4. In each pass we apply distributingsort.

The input list (with a symbolic notation):L = 〈103, 232, 111, 013, 211, 002, 012〉

First pass (according to the rightmost digits of the numbers):Z0 = 〈〉Z1 = 〈111, 211〉Z2 = 〈232, 002, 012〉

48

Z3 = 〈103, 013〉L = 〈111, 211, 232, 002, 012, 103, 013〉

Second pass (according to the middle digits of the numbers):Z0 = 〈002, 103〉Z1 = 〈111, 211, 012, 013〉Z2 = 〈〉Z3 = 〈232〉L = 〈002, 103, 111, 211, 012, 013, 232〉

Third pass (according to the leftmost digits of the numbers):Z0 = 〈002, 012, 013〉Z1 = 〈103, 111〉Z2 = 〈211, 232〉Z3 = 〈〉L = 〈002, 012, 013, 103, 111, 211, 232〉

In order for radix sort to work correctly, the digit sorts must be stable.Distributing sort satises this requirement. If distributing sort runs in lineartime (Θ(n) where n is the length of the input list), and the number of digitsis constant d, radix sort also runs in linear time Θ(d ∗ n) = Θ(n).

Provided that we have to sort linked lists where the keys of the list-elementsare d-digit natural numbers with number base r, the implementation of thealgorithm above is straightforward. For example, let us suppose that wehave an L C2L with header, and function digit(i, r, x) can extract the ithdigit of number x, where digit 1 is the lowest-order digit and digit d is thehighest-order digit, in Θ(1) time. radix_sort( L : E2* ; d, r : N )

BinHeadZ : E2[r] // the headers of the lists representing the bins

Bz : E2*[r] // pointers to the headers

i := 0 to r − 1

Bz[i] := &BinHeadZ[i] // Initialize the ith pointer.

i := 1 to d

distribute(L, i, Bz) // Distribute L on the ith digits of keys.

gather(Bz, L) // Gather form the bins back into L

49

distribute( L : E2* ; i : N ; Bz : E2*[] )

L→ next 6= L

p := L→ next ; out(p)

precede( p,Bz[digit(i, Bz.M, p→ key)] gather( Bz : E2*[] ; L : E2* )

i := 0 to Bz.M − 1

append(L,Bz[i]) // add to the end of L the elements form Bz[i] append( L,B : E2* )

AAB → next 6= B

p := L→ prev ; q := B → next ; r := B → prev

p→ next := q ; q → prev := p

r → next := L ; L→ prev := r

B → next := B ; B → prev := B

SKIP

Clearly, Tappend ∈ Θ(1), so Tgather ∈ Θ(r) where r = Bz.M .And Tdistribute(n) ∈ Θ(n) where n = |L|.Thus Tradix_sort(n, d, r) ∈ Θ(r + d(n+ r)).Consequently, if d is constant and r ∈ O(n), then Tradix_sort(n) ∈ Θ(n).

In a typical computer, which is a sequential random-access machine, we some-times use radix sort to sort records of information that are keyed by multipleelds. For example, we might wish to sort dates by three keys: year, month,and day. We could run a sorting algorithm with a comparison function that,given two dates, compares years, and if there is a tie, compares months, andif another tie occurs, compares days. Alternatively, we could sort the in-formation three times with a stable sort: rst on day, next on month, andnally on year. [1]

8.4 Counting sort

While the previous version of radix sort is ecient on linked lists, countingsort can be applied eciently to arrays, and another version of radix sort canbe built on it.

Remember that stable sorting algorithms maintain the relative order ofrecords with equal keys (i.e. values).

50

Counting sort is stable, and it is ideal auxiliary method of radix sortbecause of its linear time complexity.

Here comes a bit more general study than needed for radix sort. Whencounting sort is used for radix sort, its key function ϕ must select the appro-priate digit.

The sorting problem: Given array A:T[n], r ∈ O(n) positive integer,ϕ : T → 0..(r−1) key selection function.

Let us sort array A with stable sort, and with linear time complexity sothat the result is produced in array B.

counting_sort(A,B : T[] ; r : N ; ϕ : T → 0..(r−1))

Z : N[r] // counter array

k := 0 to r−1

Z[k] := 0 // init the counter array

i := 1 to A.M

Z[ϕ(A[i])]++ // count the items with the given key

k := 1 to r−1

Z[k] += Z[k − 1] // Z[k] := the number of items with key ≤ k

i := A.M downto 1

k := ϕ(A[i]) // k := the key of A[i]

B[Z[k]] := A[i] //Let A[i] be the last of the unprocessed items with key k

Z[k]−− // The next one with key k must be put before A[i]

The rst loop of the procedure above assigns zero to each element of thecounting array Z.

The second loop counts the number of occurences of key k in Z[k], foreach possible key k.

The third loop sums the number of occurences of keys ≤ than k, consid-ering each possible key k.

The number of keys ≤ 0 is the same as the number of keys = 0, so thevalue of Z[0] is not changed in the third loop. Considering greater keys wehave that the number of keys ≤ k equals to the number of keys = k + thenumber of keys ≤ k − 1. Thus the new value of Z[k] can be counted withadding the new value of Z[k−1] to the old value Z[k].

The fourth loop goes on the input array in reverse direction. We put theelements of input array A into output array B: Considering any key k in

51

the input array, rst we process its last occurence. The element containingit is put into the last place reserved for keys = k, i.e. into B[Z[k]]. Nextwe decrease Z[k] by one, and according to this reverse direction the nextoccurence of key k is going to be the immediate predecessor of the actualitem etc. Thus the elements with the same key remain in their originalorder, and we receive a stable sort.

The time complexity is clearly Θ(n+r). Provided that r ∈ O(n), Θ(n+r) =Θ(n), and so T (n) ∈ Θ(n).

Illustration of counting sort: We suppose that we have to sort numbersof two digits with number base 4 according to their right-side digit, i.e.function ϕ selects the rightmost digit.

The input:1 2 3 4 5 6

A : 02 32 30 13 10 12

The changes of counter array Z [the rst column reects the rst loop ini-tializing counter array Z to zero, the next six columns reect the second loopcounting the items with key k, for each possible key; the column labeled by∑

reects the third loop which sums up the number of items with keys ≤ k;and the last six columns reect the fourth loop placing each item of the inputarray into its place in the output array]:

Z 02 32 30 13 10 12∑

12 10 13 30 32 020 0 1 2 2 1 01 0 22 0 1 2 3 5 4 3 23 0 1 6 5

The output:1 2 3 4 5 6

B : 30 10 02 32 12 13

Now we suppose that the result of the previous counting sort is to be sortedaccording to the left-side digits of the numbers (of number base 4), i.e. func-tion ϕ selects the leftmost digits of the numbers.

The input:1 2 3 4 5 6

B : 30 10 02 32 12 13

The changes of counter array Z:N[4]:

52

Z 30 10 02 32 12 13∑

13 12 32 02 10 300 0 1 1 01 0 1 2 3 4 3 2 12 0 43 0 1 2 6 5 4

The output:1 2 3 4 5 6

A : 02 10 12 13 30 32

The rst counting sort ordered the input according to the right-side digitsof the numbers. The second counting sort ordered the result of the rst sortaccording to the left-side digits of the numbers using a stable sort. Thusin the nal result the numbers with the same left-side digits remained inorder according to their right-side digits. Consequently in the nal result thenumbers are already sorted according to their both digits.

Therefore the two counting sorts illustrated above form the rst and sec-ond passes of a radix sort. And our numbers have just two digits now, so wehave performed a complete radix sort in this example.

8.5 Radix-Sort on arrays ([1] 8.3)

The keys of array A are d-digit natural numbers with number base r. Therightmost digit is the least signicant (digit 1), and the leftmost digit is themost signicant (digit d). radix_sort(A : dDigitNumber[] ; d : N)

i := 1 to d

use a stable sort to sort array A on digit i

Provided that the stable sort is counting sort, the time complexity of Radixsort is Θ(d(n+r)) (n = A.M). If d is a constant and r ∈ O(n), Θ(d(n+r)) =Θ(n), i.e. T (n) ∈ Θ(n).

8.6 Bucket sort

We suppose that the items to be sorted are elements of the real interval [0; 1).This algorithm is ecient, if the keys of the input are equally distributed

on [0; 1). We sort the buckets with some known sorting method like insertionsort or merge sort.

53

bucket_sort( L : list )

n := the length of L

Z : list[n] // Create the buckets Z[0..(n−1)]

j := 0 to (n−1)

Let Z[j] be empty list

L 6=

Remove the rst element of list L

Insert this element according to its key k into list Z[bn ∗ kc]j := 0 to (n−1)

Sort list Z[j] nondecreasingly

Append lists Z[0], Z[1], . . . , Z[n− 1] in order into list L

Clearly mT (n) ∈ Θ(n). If the keys of the input are equally distributedon [0; 1), AT (n) ∈ Θ(n). MT (n) depends on the sorting method we usewhen sorting lists Z[j] nondecreasingly. For example, using insertion sortMT (n) ∈ Θ(n2); using merge sort MT (n) ∈ Θ(n lg n).

54

9 Hash Tables

In everyday programming practice we often need so called dictionaries. Theseare collections of records with unique keys. Their operations: (1) inserting anew record into the dictionary, (2) searching for a record identied by a key,(3) removing a record identied by a key (or localized by a previous search).

Besides AVL trees, B+ trees (see them in the next semester), and otherkinds of balanced search trees dictionaries are often represented and imple-mented with hash tables, provided that we would like to optimize the averagerunning time of the operations. Using hash tables the average running timeof the operations above is the ideal Θ(1), while the maximum of it is Θ(n).(With balanced search trees the worst case and average case performance ofeach key-based operation are Θ(lg n).)

Notations:m : the size of the hash tableZ[0..(m− 1)] : the hash tableZ[0], Z[1], . . . , Z[m− 1] : the slots of the hash table : empty slot in the hash table, if we use direct-address tables or keycollisions are resolved by chainingE : the key of empty slots in case of open addressingD : the key of deleted slots in case of open addressingn : the number of records stored in the hash tableα = n/m : load factorU : the universe of keys; k, k′, ki ∈ Uh : U → 0..(m− 1) : hash function

We suppose that the hash table does not contain two or more records withthe same key, and that h(k) can be calculated in Θ(1) time.

9.1 Direct-address tables

In case of direct address tables we do not have hash functions. We supposethat U = 0..(m−1) where m ≥ n, but m is not too big compared to n.Z : D∗[m] is the direct-address table. Its slots are pointers referring torecords of type D. Each record has a key k : U and contains satellite data.The direct-address table is initialized with pointers.

D+ k : U // k is the key+ . . . // satellite data

55

init( Z:D*[] )

i := 0 to Z.M−1

Z[i] :=

search( Z:D*[] ; k:U ):D*

return Z[k] insert( Z:D*[] ; p:D* ):B

AAZ[p→ k] =

Z[p→ k] := p

return truereturn false

remove( Z:D*[] ; k:U ):D*

p := Z[k]

Z[k] := return p

Clearly Tinit(m) ∈ Θ(m) wherem = Z.M . And for the other three operationswe have T ∈ Θ(1).

9.2 Hash tables

Hash function: Provided that |U | >> n, direct address tables cannot beapplied or at least applying them is wasting space. Thus we use a hashfunction h : U → 0..(m−1) where typically |U | >> m (the size of theuniverse U of keys is much greater than the size m of the hash table). Therecord with key k is to be stored in slot Z[h(k)] of hash table Z[0..(m−1)].

Remember that the hash table should not contain two or more recordswith the same key, and that h(k) must be calculated in Θ(1) time.

Function h : U → 0..(m−1) is simple uniform hashing, if it distributesthe keys evenly into the slots, i.e. any given key is equally likely to hashinto any of the m slots, independently of where other items have hashed to.Simple uniform hashing is a general requirement about hash functions.

Collision of keys: Provided that h(k1) = h(k2) for keys k1 6= k2 we speak ofkey collision. Usually |U | >> m, so key collisions probably happen, and wehave to handle this situation.

For example, let us suppose that the keys are integer numbers andh(k) = k mod m. Then exactly those keys are hashed to slot s for whichs = k mod m.

9.3 Collision resolution by chaining

We suppose that the slots of the hash table identify simple linked lists (S1L)that is Z:E1*[m] where the elements of the lists contain the regular eldskey and next, plus usually additional elds (satellite data). Provided that

56

the hash function maps two or more keys to the same slot, the correspondingrecords are stored in the list identied by this slot.

E1+key : U. . . // satellite data may come here+next : E1*+E1() next := init( Z:E1*[] )

i := 0 to Z.M−1

Z[i] :=

search( Z:E1*[] ; k:U ):E1*

return searchS1L(Z[h(k)],k) insert( Z:E1*[] ; p:E1* ):B

k := p→ key ; s := h(k)

AAsearchS1L(Z[s], k) =

p→ next := Z[s]

Z[s] := p

return true

return false

searchS1L( q:E1* ; k:U ):E1*

q 6= ∧ q → key 6= k

q := q → next

return q

remove( Z:E1*[] ; k:U ):E1*

s := h(k) ; p := ; q := Z[s]

q 6= ∧ q → key 6= k

p := q ; q := q → next

AAq 6=

AAp =

Z[s] := q → next p→ next := q → next

q → next := SKIP

return q

Clearly Tinit(m) ∈ Θ(m) where m = Z.M . For the other three operationsmT ∈ Θ(1), MT (n) ∈ Θ(n), AT (n,m) ∈ Θ(1 + n

m).

AT (n,m) ∈ Θ(1 + nm

) is satised, if function h : U → 0..(m−1) is simpleuniform hashing, because the average length of the lists of the slots is equalto n

m= α.

57

Usually nm∈ O(1) is required. In this case AT (n,m) ∈ Θ(1) is also

satised for insertion, search, and removal.

9.4 Good hash functions

Division method: Provided that the keys are integer numbers,

h(k) = k mod m

is often a good choice, because it can calculated simply and eciently. Andif m is a prime number not too close to a power of two, it usually distributesthe keys evenly among the slots, i.e. on the integer interval 0..(m−1).

For example, if we want to resolve key collision by chaining, and we wouldlike to store approximately 2000 records with maximum load factor α ≈ 3,then m = 701 is a good choice: 701 is a prime number which is close to2000/3, and it is far enough from the neighboring powers of two, i.e. from512, and 1024.

Keys in interval [ 0 ; 1): Provided that the keys are evenly distributed on[ 0 ; 1), function

h(k) = bk ∗mc

is also simple uniform hashing.

Multiplication method: Provided that the keys are real numbers, andA ∈ (0; 1) is a constant,

h(k) = bk ∗ A ∗mc

is a hash function. (x is the fraction part of x.) It does not distribute thekeys equally well with all the possible values of A.

Knuth proposes A =√5−12≈ 0, 618, because it is likely to work reasonably

well. Compared to the division method it has the advantage that the valueof m is not critical.

Each method above supposes that the keys are numbers. If the keys arestrings, the characters can be considered digits of unsigned integers with theappropriate number base. Thus the strings can be interpreted as big naturalnumbers.

58

9.5 Open addressing

The hash table is Z : R[m]. The records of type R are directly in the slots.Each record has a key eld k : U ∪ E,D where E 6= D; E,D /∈ U areglobal constants in order to indicate empty (E) and deleted (D) slots.

R+ k : U ∪ E,D // k is a key or it is Empty or Deleted+ . . . // satellite data

init( Z:R[] )

i := 0 to Z.M−1

Z[i].k := E

Notations for open addressing:h : U × 0..(m− 1)→ 0..(m− 1) : probing function〈h(k, 0), h(k, 1), . . . , h(k,m− 1)〉 : potential probing sequence

The hash table does not contain double keys.

The empty and deleted slots together are free slots. The other slots areoccupied.) Instead of a single hash function we have m hash functions now:

h(·, i) : U → 0..(m− 1) (i ∈ 0..(m− 1))

In open addressing we try these in this order one after the other if needed.

9.5.1 Open addressing: insertion and search, without deletion

In many applications of dictionaries (and hash tables) we do not need dele-tion. Insertion and search are sucient. In this case insertion is simpler.

Let us suppose that we want to insert record r with key k into the hashtable. First we probe slot h(k, 0). If it is occupied and its key is not k, wetry h(k, 1), and so on, throughout 〈h(k, 0), h(k, 1), . . . , h(k,m− 1)〉 until- we nd an empty slot, or- we nd an occupied slot with key k, or- all the slots of the potential probing sequence have been considered butfound neither empty slot nor occupied slot with key k.+ If we nd an empty slot, we put r into it. Otherwise insertion fails.

59

〈h(k, 0), h(k, 1), . . . , h(k,m− 1)〉 is called potential probing sequence becauseduring insertion, or search (or deletion) only a prex of it is actually gener-ated. This prex is called actual probing sequence.

The potential probing sequence must be a permutation of 〈0, 1, . . . , (m−1)〉, which means, that it covers the whole hash table, i.e. it does not refertwice to the same slot.

The length of the actual probing sequence of insertion/search/deletion isi, i this operation stops at probe h(k, i− 1).

When we search for the record with key k, again we follow the potentialprobing sequence 〈h(k, 0), h(k, 1), . . . , h(k,m− 1)〉.- We stop successfully when we nd an occupied slot with key k.- The search fails, if we nd an empty slot, or we use up the potential probingsequence unsuccessfully.

In ideal case we have uniform hashing: the potential probe sequence of eachkey is equally likely to be any of the m! permutations of 〈0, 1, . . . , (m− 1)〉.

Provided that- the hash table does not contain deleted slots,- its load factor α = n/m satises 0 < α < 1, andwe have uniform hashing,+ the expected length of an unsuccessful search / successful insertion is atmost

1

1− α+ and the expected length of a successful search / unsuccessful insertion isat most

1

αln

1

1− αFor example, the rst result above implies that with uniform hashing, if thehash table is half full, the expected number of probes in a unsuccessful search(or in a successful insertion) is less than 2. If the hash table is 90% full, theexpected number of probes is less then 10.

Similarly, the second result above implies that with uniform hashing, ifthe hash table is half full, the expected number of probes in a successfulsearch (or in an unsuccessful insertion) is less than 1.387. If the hash tableis 90% full, the expected number of probes is less then 2.559. [1].

9.5.2 Open addressing: insertion, search, and deletion

A successful deletion consists of a successful search for the slot Z[s] containinga given key + the assignment Z[s].k := D (let the slot be deleted). Z[s].k :=

60

E (let the slot be empty) is not correct. For example, let us suppose that weinserted record r with key k into the hash table, but it could not be put intoZ[h(k, 0)] because of key conict, and we put it into Z[h(k, 1)]. And then wedelete the record at Z[h(k, 0)]. If this deletion performed Z[h(k, 0)].k := E,a subsequent search for key k would stop at the empty slot Z[h(k, 0)], and itwould not nd record r with key k in Z[h(k, 1)]. Instead, deletion performsZ[h(k, 0)].k := D. Then the subsequent search for key k does not stop atslot Z[h(k, 0)] (because neither it is empty nor it contains key k), and itnds record r with key k in Z[h(k, 1)]. (Clearly the procedures of search anddeletion are not changed in spite of the presence of deleted slots.)

Thus during search we go through deleted slots and we stop when- we nd the slot with the key we search for (successful search), or- we nd an empty slot or use up the the potential probe sequence of thegiven key (unsuccessful search).

Insertion becomes more complex because of the presence of deleted slots.During the insertion of record r with key k, we perform a full search for kbut we also remember the rst empty slot found during the search.- If the search is successful, then the insertion fails (because we do not allowduplicated keys).- If the search is unsuccessful, but some deleted slot is remembered, then weput r into it.- If the search is unsuccessful, no deleted slot is remembered, but the searchstops at an empty slot, then we put r into it.- If the search is unsuccessful, and neither deleted nor empty slot is found,then the insertion fails, because the hash table is full.

If we use a hash table for a long time, there may be many deleted slots andno empty slot, although the table is far from being full. This means that theunsuccessful searches will check all the slots, and also the other operationsslow down. So we have to get rid of the deleted slots, for example, byrebuilding the whole table.

9.5.3 Linear probing

In this subsection, and in the next two we consider three strategies in or-der to generate actual probing sequences, that is the adequate prex of〈h(k, 0), h(k, 1), . . . , h(k,m−1)〉. In each case we have primary hash functionh1 : U → 0..(m−1) where h(k, 0) = h1(k). If it is needed, starting from thisslot we go on step by step throughout the slots of the hash table, according

61

to a well dened rule, until we nd the appropriate slot, or we nd that theactual operation is impossible. h1 must be simple uniform hash function.

The simplest strategy is linear probing:

h(k, i) = (h1(k) + i) mod m (i ∈ 0..(m− 1))

It is easy to implement linear probing, but we have only m dierent probingsequences instead of the m! probing sequences needed for uniform hashing:Given two keys, k1 and k2; if h(k1, 0) = h(k2, 0) then their whole probingsequences are the same. In addition, dierent probing sequences tend to belinked into continuous, long runs of occupied slots, increasing the expectedtime of searching. This problem is called primary clustering. The longersuch a cluster is the the more probable that it becomes even longer after thenext insertion. For example, let we have two free slots with i occupied slotsbetween them. Then the probability that its length will be increased by thenext insertion is at least (i+ 2)/m. And it may even be linked with anothercluster. Linear probing may be selected only if the probability of key collisionis extremely low.

9.5.4 Quadratic probing

h(k, i) = (h1(k) + c1i+ c2i2) mod m (i ∈ 0..m− 1)

where h1 : U → 0..(m−1) is the primary hash function; c1, c2 ∈ R; c2 6= 0.The dierent probing sequences are not linked together, but we have onlym dierent probing sequences instead of the m! probing sequences neededfor uniform hashing: Given two keys, k1 and k2; if h(k1, 0) = h(k2, 0) thentheir whole probing sequences are the same. This problem is called secondaryclustering.

Choosing the constants of quadratic probing: In this case the poten-tial probing sequence, i.e. 〈h(k, 0), h(k, 1), . . . , h(k,m − 1)〉 may have equalmembers which implies that it does not cover the hash table. Therefore wemust be careful about selecting the constants of quadratic probing.

For example, if sizem of the hash table is a power of 2, then c1 = c2 = 1/2is appropriate. In this case

h(k, i) =

(h1(k) +

i+ i2

2

)mod m (i ∈ 0..m− 1)

Thus

(h(k, i+ 1)− h(k, i)) mod m =

((i+ 1) + (i+ 1)2

2− i+ i2

2

)mod m =

62

(i+ 1) mod m

So it is easy to compute the slots of the probing sequences recursively:

h(k, i+ 1) = (h(k, i) + i+ 1) mod m

Exercise 9.1 Write the structograms of the operations of hash tables withquadratic probing (c1 = c2 = 1/2) applying the previous recursive formula.

9.5.5 Double hashing

h(k, i) = (h1(k) + ih2(k)) mod m (i ∈ 0..(m− 1))

where h1 : U → 0..(m−1) and h2 : U → 1..(m−1) are hash functions.The probing sequence covers the hash table, i h2(k) and m are relativeprimes. It is satised, for example, if m > 1 is a power of 2 and h2(k) is oddnumber for each k ∈ U , or if m is prime number. For example, if m is primenumber (which should not be close to powers of 2) and m′ is a bit smaller(let m′ = m− 1 or m′ = m− 2) then

h1(k) = k mod m

h2(k) = 1 + (k mod m′)

is an eligible choice.In case of double hashing for each dierent pairs of (h1(k), h2(k)) there is a

dierent probing sequence, and so we have Θ(m2) dierent probing sequences.Although the number of the probing sequences of double hashing is far

from the ideal number m! of probing sequences, its performance appears tobe very close to that of the ideal scheme of uniform hashing.

Illustration of the operations of double hashing: Becauseh(k, i) = (h1(k) + ih2(k)) mod m (i ∈ 0..(m−1)), therefore h(k, 0) = h1(k)and h(k, i + 1) = (h(k, i) + d) mod m where d = h2(k). After calculatingthe place of the rst probing (h1(k)) we always make a step of distance dcyclically around the table..

Example: m = 11 h1(k) = k mod 11 h2(k) = 1 + (k mod 10).In the operations column of the next table, for each operation the rst

character is its code, i.e. i=insert, s=search, d=delete. Next the key of theoperation comes (being neither E nor D). In this table we do not handlesatellite data. We show just the keys. In the index of the key there is h2(k),

63

but only if it is needed. Next we nd the probing sequence between 〈〉brackets. Finally we have a + sign for a successful, and an x sign foran unsuccessful operation. In the table we do not handle satellite data, weprocess only the keys. At the last element of the probing sequence we gavethe actual key of the slot (written into the right lower index of the slot).Thus we indicate the reason of the actual behavior. If we nd some deletedslot(s) during an insertion, we write a letter D into the right lower index ofthe rst such slot, because insertion remembers this slot. (See the details insection 9.5.2.)

In the rst 11 columns of the table the cells representing empty slots aresimply left empty. We wrote the reserving key into each cell of occupiedslots, while the cells of the deleted slots contain letter D.

In this table we start a new line after each operation, except of the lastsearch. (At the tests and exams it is enough to start a new line only whenthe content of a nonempty, [i.e. occupied or deleted] slot has been changed.)

0 1 2 3 4 5 6 7 8 9 10 operationsi32〈10E〉 +

32 i40〈7E〉 +40 32 i37〈4E〉 +

37 40 32 i156〈4; 10; 5E〉 +37 15 40 32 i701〈4; 5; 6E〉 +37 15 70 40 32 s156〈4; 10; 515〉 +37 15 70 40 32 s1045〈5; 10; 4; 9E〉 x37 15 70 40 32 d156〈4; 10; 515〉 +37 D 70 40 32 s701〈4; 5; 670〉 +37 D 70 40 32 i701〈4; 5D; 670〉 x37 D 70 40 32 d37〈437〉 +D D 70 40 32 i1045〈5D; 10; 4; 9E〉 +D 104 70 40 32 s156〈4; 10; 5; 0E〉 x

Exercise 9.2 (Programming of double hashing)Write the structogramsof insertion, search, and deletion where x is the record to be inserted, and kis the key we search for, and it is also the key of the record we want to delete.

The hash table is Z[0..(m−1)] where m = Z.M .In a search we try to nd the record identied by the given key. After a

successful search we return the position (index of the slot) of the appropriaterecord. After an unsuccessful search we return −1.

After a successful insertion we return the position of the insertion. Atan unsuccessful insertion there are two cases. If there is no free place in the

64

hash table, we return −(m + 1). If the key of the record to be inserted isfound at slot j, we return −(j + 1).

In a deletion we try to nd and delete the record identied by the givenkey. After a successful deletion we return the position (index of the slot) ofthe appropriate record. After an unsuccessful deletion we return −1.

Solution: insert( Z:R[] ; x:R ):Z

k := x.k ; j := h1(k)

i := 0 ; d := h2(k)

i < Z.M ∧ Z[j].k /∈ E,D

AAZ[j].k = k

return−(j + 1)

i+ +

j := (j + d) mod Z.M

AAi < Z.M

ii := j return −(Z.M + 1)

i < Z.M ∧ Z[j].k 6= E

AAZ[j].k = k

return−(j + 1)

i+ +


Z[ii] := x

return ii

search(Z:R[] ; k:U ):Z

i := 0 ; j := h1(k)

b := true ; d := h2(k)

b

AAZ[j].k = k

return j SKIP

i+ +

b := (Z[j].k 6= E∧ i < Z.M)


return −1

delete( Z:R[] ; k:U ):Z

j := search(Z, k)

AAj ≥ 0

Z[j].k := D SKIP

return j

65

algorithms and data structures i. lecture...

Documents