e-notes_data srucrure dmietr

90
 Subject : Data Structure & Program Design Sem : IV Unit I Data Structures A data structure is a scheme for organizing data in the memory of a computer. Some of the more commonly used data structures include lists, arrays, stacks, queues, heaps, trees, and graphs. The way in which the data is organized affects the performance of a program for different tasks. Computer programmers decide which data structures to use based on the nature of the data and the processes that need to be performed on that data. Abstract data type A In computing, an abstract data type or abstract data structure is a mathematical  model for a certain class of data structures that have similar behavior; or for certain data types of one or more  programming languages that have similar semantics. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations [1] . For example, an abstract stack data structure could be defined by two operations: push, that inserts some data item into the structure, and pop, that extracts an item from it; with the constraint that each pop always returns the most recently pushed item that has not  been popped yet. When analyzing the efficiency of algorithms that use stacks, one may also specify that both operations take the same time no matter how many items have been  pushed into the stack, and that the stack uses a constant amount of storage for each element. Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented by speci fic data types or data struc tures , in many ways and in many  programming languages; or described in a formal specification language. ADTs are often implemented as modules: the module's interface declar es proced ures that corre spond to the ADT ope rat ions, sometimes wit h comments tha t des cri be the cons tra ints. This infor mati on hidin g strategy allows the implementation of the module to be changed without disturbing the client programs.

Upload: vinod-nayyar

Post on 11-Jul-2015

37 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 1/90

 

Subject : Data Structure & Program Design Sem : IV

Unit I

Data Structures

A data structure is a scheme for organizing data in the memory of a computer. Some of 

the more commonly used data structures include lists, arrays, stacks, queues, heaps, trees,

and graphs.

The way in which the data is organized affects the performance of a program for differenttasks.

Computer programmers decide which data structures to use based on the nature of thedata and the processes that need to be performed on that data.

Abstract data type

A In computing, an abstract data type or  abstract data structure is a mathematical 

model for a certain class of data structures that have similar behavior; or for certain data 

types of one or more   programming languages that have similar  semantics. An abstract

data type is defined indirectly, only by the operations that may be performed on it and bymathematical constraints on the effects (and possibly cost) of those operations[1].

For example, an abstract stack data structure could be defined by two operations: push,

that inserts some data item into the structure, and pop, that extracts an item from it; withthe constraint that each pop always returns the most recently pushed item that has not

 been popped yet. When analyzing the efficiency of algorithms that use stacks, one may

also specify that both operations take the same time no matter how many items have been

 pushed into the stack, and that the stack uses a constant amount of storage for eachelement.

Abstract data types are purely theoretical entities, used (among other things) to simplify

the description of abstract algorithms, to classify and evaluate data structures, and to

formally describe the type systems of programming languages. However, an ADT may beimplemented by specific data types or  data structures, in many ways and in many

 programming languages; or described in a formal specification language. ADTs are often

implemented as modules: the module's interface declares procedures that correspond tothe ADT operations, sometimes with comments that describe the constraints. This

information hiding strategy allows the implementation of the module to be changed

without disturbing the client programs.

Page 2: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 2/90

 

Complexity of Algorithms

It is very convenient to classify algorithms based on the relative amount of time

or relative amount of space they require and specify the growth of time /space

requirements as a function of the input size. Thus, we have the notions of:-

Time Complexity: Running time of the program as a function of the size of input

-

Space Complexity: Amount of computer memory required during the program

execution, as a function of the input size

Big Oh Notation

-

A convenient way of describing the growth rate of a function and hence the timecomplexity of an algorithm.

Let n be the size of the input and f (n), g (n) be positive functions of n.

DEF.Big Oh.  f  (n) is O( g (n)) if and only if there exists a real, positive constant

C and a positive integer n0 such that 

 f (n) Cg (n) n n0

•  Note that O( g (n)) is a class of functions.

• The "Oh" notation specifies asymptotic upper bounds

• O(1) refers to constant time. O(n) indicates linear time; O(nk ) (k fixed) refers to

 polynomial time; O(log n) is called logarithmic time; O(2n) refers to exponentialtime, etc.

Examples

• Let f(n) = n2 + n + 5. Then

-

f(n) is O(n2)-

f(n) is O(n3)

-f(n) is not O(n)

• Let f(n) = 3n 

-

f(n) is O(4n)

-

Page 3: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 3/90

 

f(n) is not O(2n)

• If f 1(n) is O(g1(n)) and f 2(n) is O(g2(n)), then

-

f 1(n) + f 2(n) is O(max(g1(n), g2(n)))

Algorithm:-Algorithm is well-defined procedure for solving a problem. This is given in pseudo codeformat and is independent of the programming language or the type of computers be

used. A program is an implementation of the algorithm.

The efficiency of the program depends on what data structure and algorithms are selected

to solve the problem. Algorithm analysis is required to measure the efficiency of thealgorithm and see how the execution time varies with the size of the input.

The term algorithm derived from is derived from name of Persian mathematician AL-

Ahearizmi. An algorithm is finite number of steps to solve problem. Algorithm is definegeneral procedure which can be implemented in any programming language and solved

using any type of computers. Hence algorithm given in pseudo code format.

Characteristics of an algorithm: -

The efficiency of algorithm plays a major role in determining the efficiency of the

  program when implemented. The choice of the data structure selected for theimplementation of the algorithm also adds to the efficiency of the program.

Algorithm has certain characteristics

(i)Input:-

An algorithm must be provided with any number of input data values or in some cases noexternal input is provided.

(ii)Output:-

As we go through the algorithm, step by step, processing statements will yield someresult. This enables us to verify algorithm. Hence at least one output must be provided by

algorithm.

(iii)Definiteness:-

Each statement must be very much clear, distinct. This ensures that statements must be

unambiguous.

(iv)Finiteness:-

The algorithm must terminate after a finite number of steps.e.g.

step 1:let a=10

step 2:if(a>10)then goto step 5step 3:X=Y* z

step 4:Print X and goto step 2

step 5:stopIn above algorithm nowhere value of a is changed, which happens to be controlling the

flow and hence never terminates. Such statement must be avoided.

Page 4: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 4/90

 

(v)Effectiveness:-

The algorithm should be effective & efficient in terms of time as well as space

requirement.

Algorithm analysis important following reasons:-

Algorithm analysis is what we do before coding

1] Analysis is more reliable than experimentation or testing.

Analysis gives the performance of the algorithm for all cases of inputs, whereas testing is

done only on the specific cases of input.

2] Analysis helps to select better algorithm.

When there is different algorithm for same task, analysis helps to pick out the mostefficient algorithm.

3] Analysis predicts performance.Analysis can be used to predict the run time of the algorithm. If the algorithm is analyzed

to be very slow, then it need not be implemented at all.

4] Analysis identifies scope of improvement of algorithm.Analysis of an algorithm can find out which portion of the algorithm is faster and which

is slower. The slower part may be modified, if possible, to increase execution time.An algorithm is mainly analyzed to determine the execution time of the program and totalmemory space required.

The execution time of a program cannot be determined exactly and it

is not of much use. This is because the execution time varies depending on the variousfactors like choice of programming language constructs instructions, computer hardware

etc. We analyze the relative run time of an algorithm as a function of its input size. We

use the count of the abstract operations for analysis.

Operation count:-

The abstract operation count is used to analyze the algorithm. Following are a fewexample of operation count in which consider only major looping constructs not the

complete program.

1. Following statements prints the first element of an array

 printf(“%d”, data[0]);

This statement executed once irrespective of the size of the input. Hence abstract

Page 5: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 5/90

 

operation count is T(n)=1.

2. Following statements prints the first element of an array

for(i=0;i<n;i++) printf(“%d”, data[i]);

This statement executed once irrespective of the size of the input. hence abstractoperation count is T(n)=2.

3. Print the first even number in the array

for(i=0;i<n;i++)if(data[i]%2==0)

return data[i];

Here abstract count depends on actual input data. If the first element itself happens to be

even the T (n) is 1, which is best case.If only last element is even, which is the worst case.

The average case is the even element being anywhere in the array. 

Algorithm Analysis:-

Analysis of algorithms focuses on computation of space and time complexity.

Space can be defined in terms of space required to store the instructions and data whereasthe time is computer time an algorithm might require for its execution,which usually

depends on the size of the algorithm and input.

There are different types of time complexities which can be analyzed for an

algorithm:

• Best Case Time Complexity

• Worst Case Time Complexity

• Average Case Time Complexity

• Best Case Time Complexity:-

The best time complexity of an algorithm is a measure of the minimum time that

algorithm will require for an input of size ‘n’. The running time of many algorithm

varies not only for input of different sizes also different input of same size.for example running time of some sorting algorithms sorting will depend on the ordering

of the input data. There if an input data of ‘n’ items are presented in sorted order, theoperations performed by the algorithm will take at least time, just checking data in sorted

order which will correspond to best case time complexity for an algorithm.

Consider following looping construct

1.Print the first even number in the arrayfor(i=0;i<n;i++)

Page 6: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 6/90

 

if(data[i]%2= =0)

return data[i];

Here abstract count depends upon on the actual input.If the first element itself happen to be even, then operation count T(n) is 1.Which is best

case.

• Worst Case Time Complexity:-

The worst time complexity of an algorithm is a measure of the maximum time that

algorithm will require for an input of size ‘n’. Therefore, if various algorithms for sortingtaken into account and say, ‘n’ input data items are supplied in reverse order for any

sorting algorithm, then algorithm will require n2 operations to perform the sort which will

correspond to the worst-case time complexity of the algorithm.The worst case time is complexity is useful for a number of reasons.

After knowing worst case time complexity , we guarantee that algorithm will never take

no more than this time and such a guarantee can important in some time critical software

applications.

Consider following looping construct1.Print the first even number in the array

for(i=0;i<n;i++)if(data[i]%2= =0)

return data[i];

Here abstract count depends upon on the actual input.If only last element is even, then operation count T (n) =n. which is worst case.

• Average Case Time Complexity:-

The time that an algorithm will require to execute a typical input data of size ‘n’ is knownas average time complexity. The value that is obtained by averaging the running of an

algorithm for all possible inputs of size ‘n’ can determine average-case time complexity.This case complexity may not be considered good measure as in this we have to assumeunderlying probability distribution for the inputs which if in practice is violated, then the

determination of average time complexity will meaningless.

Therefore computation of exact time taken by the algorithm for its execution is verydifficult. Thus, the work done by an algorithm for the execution of input of size ‘n’

defines the time analysis as function f (n) of the input data items.

An important step which can be considered in the analysis of an algorithm isidentifying the abstract operation on which algorithm is based.

Asymptotic Notation:-

BIG ‘O’ Notation

If f(n) represents the computing time of some algorithm and g(n) represents a knownstandards function like n, n2, n log n etc. then to write:

f(n) is O g(n)

Mean that f(n) is equal to biggest order of function g(n).This implies only when:| f(n)<=C| g(n)|

Page 7: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 7/90

 

Where C is constants

From above statement we say that the computing time of an algorithm is O(g(n)),we

mean that its execution takes no more than a constant time g(n).n is the parameter whichcharacterizes input and /or output.

For example n might be the number of inputs or the number of outputs or their sum of 

magnitude of one of them. If analysis leads to the result f(n)=O(g(n)),then it means that if the algorithm is run on the some input data for sufficiently large values of ‘n’, then the

resulting computation time will be less than some constant time |g(n)|.

Importance of BIG ‘O’ notation:-

Big O notation helps to determine the time as well as space complexity of the algorithms.

Using BIG O notation time taken by algorithm and the space required to run the

algorithm can be ascertained.The information useful to prerequisites of algorithms and to develop and design efficient

algorithm in terms of time and space complexity .

The Big ‘O’ notation has been extremely useful to classify algorithm by their 

 performance. Developer uses this notation to reach to the best solution for the given problem.

ExampleQuick sort algorithm worst case complexity is O (n2).

Bubble sort average case complexity is O (n2).

If developer has choice between two algorithms Quick sort can be graded as the better 

algorithm for sorting.O(n) is complexity is constant independent of the number of elements.

O(n)----Linear 

O(n2)----QuadraticO(n3)----Cubic

O(2

n

)----ExponentialO(logn)----Logarithmic.

Worst Case, Average Case, and Amortized Complexity 

• Worst case Running Time: The behavior of the algorithm with respect to theworst possible case of the input instance. The worst-case running time of an

algorithm is an upper bound on the running time for any input. Knowing it gives

us a guarantee that the algorithm will never take any longer. There is no need to

make an educated guess about the running time.• Average case Running Time: The expected behavior when the input is randomly

drawn from a given distribution. The average-case running time of an algorithm is

an estimate of the running time for an "average" input. Computation of average-case running time entails knowing all possible input sequences, the probability

distribution of occurrence of these sequences, and the running times for the

individual sequences. Often it is assumed that all inputs of a given size are equallylikely.

Page 8: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 8/90

 

• Amortized Running Time Here the time required to perform a sequence of 

(related) operations is averaged over all the operations performed. Amortized

analysis can be used to show that the average cost of an operation is small, if oneaverages over a sequence of operations, even though a simple operation might be

expensive. Amortized analysis guarantees the average performance of each

operation in the worst case.

1.For example, consider the problem of finding the minimum element in a list of 

elements.

Worst case = O(n)Average case = O(n)

2.Quick sort

Worst case = O(n2)

Average case = O(n log n)3.Merge Sort, Heap Sort

Worst case = O(n log n)

Average case = O(n log n)4.

Bubble sort

Worst case = O(n2)Average case = O(n2)

5.

Binary Search Tree: Search for an element

Worst case = O(n)Average case = O(log n)

Big Omega and Big Theta Notations

The notation specifies asymptotic lower bounds.

DEF. Big Omega. f (n) is said to be ( g (n)) if a positive real constant C and

a positive integer n0 such that

 f (n) Cg (n) n  n0 

An Alternative Definition :  f  (n) is said to be ( g (n)) iff a positive real

constant C such that

Page 9: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 9/90

 

 f (n) Cg (n) for infinitely many values of n.

The notation describes asymptotic tight bounds.

DEF. Big Theta.  f (n) is ( g (n)) iff positive real constants C1 and C2 and a

 positive integer n0, such that

C 1 g (n)  f (n) C 2 g (n) n  n0 

 An Example:

Let f (n) = 2n2 + 4n + 10. f (n) is O(n2). For, 

 f (n) 3n2   n 6

Thus, C = 3 and n0 = 6

Also, 

 f (n) 4n2   n 4

Thus, C = 4 and n0 = 4

 f (n) is O(n3) 

In fact, if  f (n) is O(nk ) for some k , it is O(nh) for h > k  

 f (n) is not O(n). 

Suppose a constant C such that

2n2 + 4n + 10 Cn n n0 

This can be easily seen to lead to a contradiction. Thus, we have that: 

 f (n) is (n2) and f (n) is (n2)

Page 10: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 10/90

 

Stack implementation:

Stack help computers in unfolding their recursive jobs; used in converting an expression to its

 postfix form; used in Graphs to find their traversals (we have seen that); helps in non-recursive

traversal of binary trees (we'll see this) and so on....

 Memory management:Any modern computer environment uses a stack as the primary memory management model for a

running program. Whether it's native code (x86, Sun, VAX) or JVM, a stack is at the center of the run-time

environment for Java, C++, Ada, FORTRAN, etc.

The discussion of JVM in the text is consistent with NT, Solaris, VMS, Unix runtime environments.

Stacks

• Stack is a special kind of list in which all insertions and deletions occur at one

end, called the top.

• Stack ADT is a special case of the List ADT. It is also called as a LIFO list or a

 pushdown list.

• Typical Stack ADT Operations:

1. makenull (S) creates an empty stack 

2. top (S) returns the element at the top of the stack.

Same as retrieve (first (S), S)

3. pop (S) deletes the top element of the stack 

Same as deletes (first (S), S)4. push (x, S) Insert element x at the top of stack S.

Same as Inserts (x, first (S), S)

5. empty (S) returns true if S is empty and false otherwise

• Stack is a natural data structure to implement subroutine or procedure calls and

recursion.

• Stack Implementation : Arrays, Pointers can be used. See Figures 2.5 and 2.6 

APPLICATION OF STACK 

• Direct applications• Page-visited history in a Web browser 

• Undo sequence in a text editor 

• Chain of method calls in the Java VirtualMachine or C++ runtime environment

• Indirect applications

• Auxiliary data structure for algorithms• Component of other data structures

Page 11: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 11/90

 

Figure 2.5: An array implementation for the stack ADT

Figure 2.6: A linked list implementation of the stack ADT

• Pointer Implementation of Stacks: The following code provides functions for 

implementation of stack operations using pointers. See Figures 2.7 and 2.8 for anillustration of push and pop operations on a linked stack.

typedef  struct node-tag {

item-type info ;

struct node-tag * next ;

Page 12: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 12/90

 

} node-type ;

typedef  struct stack-tag {

node-type * top ;

} stack-type ;

stack-type stack ; /* define a stack */

stack-type * sp = & stack ; /* pointer to stack */

node-type *np ; /* pointer to a node */

/* makenode allocates enough space for a new node and initializes it */

node-type * makenode (item-type item)

{

node-type * p ;

if ((p = (node-type *) malloc (sizeof 

(node-type))) = = null)

error (``exhausted memory'') ;

else {

  p info = item ;

  p next =null ;

}

return (p) ;

}

/* pushnode pushes a node onto the top of the linked stack */

Page 13: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 13/90

 

void pushnode (node-type *np, stack-type *sp)

{

if (np = = null)

error (``attempt to push a nonexistent node'')

else {

np next = sp top ;

sp top = np

}

}

void popnode (node-type * *np ; stack-type *sp) 

{

if (sp top = = null)

error (``empty stack'') ;

else {

*np = sp top ;

sp top = (* np) next ;

}

}

Figure 2.7: Push operation in a linked stack 

Page 14: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 14/90

 

 

Figure 2.8: Pop operation on a linked stack 

/* push-make a new node with item and push it onto stack */ 

void push (item-type item ; stack-type *sp)

{

 pushnode (makenode (item), sp) ;

}

/* pop-pop a node from the stack and return its item */

void pop (item-type * item, stack-type *sp)

{

Page 15: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 15/90

 

node-type * np ;

 popnode (& np, sp) ;

* item = np info ;

free (np) ;

}

Queues

• A queue is a special kind of a list in which all items are inserted at one end (calledthe rear or the back or the tail) and deleted at the other end (called the front or 

the head)

• useful ino simulation

o  breadth-first search in graphs

o tree and graph algorithms

• The Queue ADT is a special case of the List ADT, with the following typical

operations

1.makenull (Q)2.

front (Q) retrieve (first (Q), Q)

3.

enqueue (x, Q) insert (x, end(Q), Q)4.

dequeue (Q) delete (first (Q), Q)

5.empty (Q)

• Implementation : Pointers, Circular array, Circular linked list

Application of Queue

• Direct applications

-Waiting lines- Access to shared resources (e.g., printer)

Page 16: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 16/90

 

- Multiprogramming

• Indirect applications

- Auxiliary data structure for algorithms- Component of other data structures

Priority queues

At the post office, express mail is processed and delivered before regular mail, becauseexpress mail's priority is higher. In a similar way, an operating system will let a high-

 priority user control a printer or a processor chip before a low-priority user. A data

structure, called a priority queue, is used to manage such resource usage.

A priority queue is a data structure that is meant to hold objects that require ``service''(e.g., use of a printer). The essential operations are:

• insert(priority_num, ob), which adds ob to the structure, paired with an integer  priority number. (The tradition is that the lower the priority number, the quicker 

the object will be serviced.)

• retrieve(), which removes and returns the object in the queue that has the lowest

 priority number. (If multiple objects have the lowest number, any one of the

objects may be returned.)

Here is an example, where string objects are inserted with priority numbers:

insert(8, "abc")insert(3, "def")

insert(4, "ghi")retrieve() ("def" is returned)insert(2, "jkl")retrieve() ("jkl" is returned)insert(4, "mno")retrieve() ("ghi" is returned)retrieve() ("mno" is returned)

At this point, the priority queue still holds "abc", whose low priority has prevented it

from leaving.

As usual, we require an implementation of a priority queue where insertion and retrieval

take time that is less than linear, in terms of the number of objects held in the queue.

Since there is an implicit ordering involved with insertion, we might try writing somevariant of a traditional queue that sorts its elements on insertions. Alas, this will produce

greater than linear-time behavior. Since the priority numbers are not sequences of 

symbols, the spelling-tree technique is not suited for the job. We require a new approach.

Page 17: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 17/90

 

Circular Queue Array Implementation

See Figure 2.10.

• Rear of the queue is somewhere clockwise from the front

• To enqueue an element, we move rear one position clockwise and write the

element in that position• To dequeue, we simply move front one position clockwise

• Queue migrates in a clockwise direction as we enqueue and dequeue

• emptiness and fullness to be checked carefully.

Figure 2.10: Circular array implementation of a queue

Page 18: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 18/90

 

Circular Linked List Implementation

• A linked list in which the node at the tail of the list, instead of having a null

 pointer, points back to the node at the head of the list. Thus both ends of a list can

 be accessed using a single pointer.

See Figure 2.11.

• If we implement a queue as a circularly linked list, then we need only one pointer 

namely tail, to locate both the front and the back.

Evaluating a Postfix ExpressionYou may be asking what a stack is good for, other than reversing a sequence of data items. One common

application is to convert an infix expression to postfix. Another is to find the value of a postfix expression.

We will not look at the conversion algorithm here, but we will examine the algorithm to evaluate a postfix

expression.

First, let's explain the terminology. An infix expression is the type that we are used to in ordinary algebra,

such as 3 + 9, which is an expression representing the sum of 3 and 9. Infix expressions place their 

(binary) operators between the two values to which they apply. In the above example, the addition operator 

was placed between the 3 and the 9.

A postfix expression, in contrast, places each operator after the two values to which it applies. ( Post means

"after", right?) The above expression would be 3 9 +, when rewritten in postfix.

Here are a few more examples in the following table. The infix form is shown on the left, and the postfix

form is given on the right.

Figure 2.11: Circular linked list implementation of a queue

Page 19: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 19/90

 

Infix: Postfix:

16 / 2 16 2 /

(2 + 14) * 5 2 14 + 5 *

2 + 14 * 5 2 14 5 * +

(6 - 2) * (5 + 4) 6 2 - 5 4 + *

 Note that postfix expressions do not use parentheses for grouping; it is not needed! Infix sometimesrequires parentheses to force a certain order of evaluation. For example, in the second example above,

 parentheses were needed to indicate that the addition should be done before the multiplication. Without the

 parentheses you get the third example, where the multiplication is done before the addition (using the

 precedence rules from ordinary arithmetic, or from C++, for that matter).

Arithmetic expressions like the above can of course be much longer. We could also allow other operators,

such as ^ for exponentiation, or perhaps a unary minus. A sample infix expression that uses exponentiationis 4 ^ 2, which means 4 to the second power. A unary minus is sometimes used as in the infix expression

-(4 + 2). A unary operator is one that is applied to a single value, as opposed to the typical binary

operators, which are applied to two values. We will not consider unary operators or exponentiation further 

here.The algorithm to evaluate a postfix expression works like this: Start with an empty stack of floats. Scan the

 postfix expression from left to right. Whenever you reach a number, push it onto the stack. Whenever you

reach an operator (call it Op perhaps), pop two items, say First and Second, and then push the value

obtained using Second Op First. When you reach the end of the postfix expression, pop a value from the

stack. That value should be the correct answer, and the stack should now be empty. (If the stack is not

empty, the expression was not a correct postfix expression.)

Let's look at the postfix expression evaluation algorithm by way of example. Consider the postfix

expression 2 14 + 5 * that was mentioned above. We already know from its infix form, (2 + 14) * 5,

that the value should be 16 * 5 = 80. The following sequence of pictures depicts the operation of the

algorithm on this example. Read through the pictures from left to right.

Let's evaluate another postfix expression, say 2 10 + 9 9 - /, which is (2 + 10) / (9 - 6) in infix.

Clearly the value should work out to be 12 / 3 = 4. Trace through the algorithm by reading the following

 pictures from left to right.

Page 20: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 20/90

 

When one reaches an operator in this algorithm, it is important to get the order right for the values to which

it applies. The second item popped off should go in front of the operator, while the first one popped off 

goes after the operator. You can easily see that with subtraction and division the order does matter.

A good exercise for the reader is to develop a program that repeatedly evaluates postfix expressions. In

fact, with enough work, it can be turned into a reasonable postfix calculator.

Infix Expression Equivalent Postfix Expression

3*4+5 34*5+

3*(4+5)/2 345+*2/

(3+4)/(5-2) 34+52-/

7-(2*3+5)*(8-4/2) 723*5+842/-*-

3-2+1 32-1+

Assume 1-digit integer operands and the binary operators + - * / only

3*(4+5)/2 345+*2/

Infix Expression Properties:

Usual precedence and associativity of operators

Parentheses used to subvert precedence

Postfix Expression Properties:

  Both operands of binary operators precede operator 

Parentheses no longer needed

Postfix Expression String Processing

Rules for processing the postfix string:

Starting from the left hand end, inspect each character of the string

Page 21: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 21/90

 

1.

if it’s an operand – push it on the stack 

2. if it’s an operator – remove the top 2 operands from the stack,

perform the indicated operation, and push the result on the stack 

An Example:  3*(4+5)/2 à 345+*2/ à 13 

Remaining Postfix String int Stack (topà) Rule Used345+*2/ empty45+*2/ 3 15+*2/ 3 4 1+*2/ 3 4 5 1*2/ 3 9 22/ 27 2/ 27 2 1null 13 2

Infix to Postfix ConversionRules for converting the infix string:

  Starting from the left hand end, inspect each character of the string

1.

if it’s an operand – append it to the postfix string

2. if it’s a ‘(‘ – push it on the stack 

3. if it’s an operator – if the stack is empty, push it on the stack else pop operators of greater or equal

 precedence and append them to the postfix string, stopping when a ‘(‘ is reached, an operator of lower 

 precedence is reached, or the stack is empty; then push the operator on the stack 

4. if it’s a ‘)’ – pop operators off the stack, appending them to the postfix string, until a ‘(‘ is encounteredand pop the ‘(‘ off the stack 

5. when the end of the infix string is reached – pop any remaining operators off the stack and append them

to the postfix string

An Example: 7-(2*3+5)*(8-4/2) à 723*5+842/-*-

  Remaining Infix String char Stack Postfix String Rule Used7-(2*3+5)*(8-4/2) empty null-(2*3+5)*(8-4/2) empty 7 1(2*3+5)*(8-4/2) - 7 32*3+5)*(8-4/2) -( 7 2

*3+5)*(8-4/2) -( 72 13+5)*(8-4/2) -(* 72 3+5)*(8-4/2) -(* 723 35)*(8-4/2) -(+ 723* 3)*(8-4/2) -(+ 723*5 1*(8-4/2) - 723*5+ 4(8-4/2) -* 723*5+ 38-4/2) -*( 723*5+ 2-4/2) -*( 723*5+8 14/2) -*(- 723*5+8 3/2) -*(- 723*5+84 1

Page 22: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 22/90

 

2) -*(-/ 723*5+84 3) -*(-/ 723*5+842 1null empty 723*5+842/-*- 4&5

Evaluating a Postfix expression

The following explains how to write a program that uses a stack toevaluate a postfix expression, e.g., 3 4 * 5 +, directly. The algorithmis:(1) As you read the expression from left to right, push each operand onthe stack (here 3 and 4) until you encounter an operator (here, *).(2) Pop the stack twice and perform a calculation using the operatorand the two operands popped (here, 3*4 or 12). Push this result (here,

12) on the stack.(3) Next push 5 on the stack and when the "+" is encountered, pop 5 andthen 12 from the stack and get 5 + 12 or 17. Then push 17 onto thestack. Since there is no more data, pop the stack and obtain 17 as thefinal answer. If during the program you can't pop the stack or there isdata remaining on the stack at the end, then our postfix expression iswrong.

Converting an Infix to a Postfix expression without parentheses

The algorithm is:(1) As you read the input string from left to right, add each operandto the ouput string (in 3 * 4 + 5, add 3 to the output string).

(2) When you encounter the first operator (here, *) push it on thestack.(3) Add the next operand to the ouput string (it is now 3 4).(4) When you encounter the next operator (here, +), while the operatoron the stack top (here, *) has a higher or equal precedence to thisoperator, pop it and add it to the output string (it is now 3 4 *).

Push the operator (here, +) on the stack.(5) Continue this process until you read the entire input string(in our example, the output string becomes 3 4 * 5).(6) Pop the stack and add the operators to the output string until thestack is empty(our output becomes 3 4 * 5 +).

The psuedocode for the while loop performed as valid is true and not

eoln is:

if operand( token ) thenpostfix := postfix + token

else if operator( token ) thenwhile not stack.empty and precedence( stack.top, token ) do

postfix := postfix + stack.pop;stack.push( token )

else valid := false

Page 23: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 23/90

 

After the loop is done and valid is true thenwhile not stack.empty do

postfix := postfix + stack.pop;

The precedence is such that:Top of stack Precedence over

----------------------------------------'*', '/' everything'+', '-' '+', '-'

Converting an Infix to a Postfix expression with parentheses.

When a '(' is encountered, its always pushed on the stack. When its onthe stack, all incoming tokens (except the ")" ) have precedence overit and are pushed on the stack. When a ')' is encountered, everythingon the stack to the first ')' is popped and added to the postfixexpression. Then the ')' is popped and discarded.

Lets look at (2+3)/(4*5)

(1) The '(' is pushed.(2) The 2 is added to the output.(3) The '+' is pushed on the stack.(4) The 3 is added to the out; its now 23.(5) The ')' triggers the '+' to be popped and added to output; its now23+.(6) The '(' is popped and discarded.(7) The '/' is pushed.(8) The '(' is pushed.(9) The 4 is added to the output; its now 23+4(10) The "*' is pushed.(11) The 5 is added to the output; its now 23+45(12) The ')' triggers the '*' to be popped and added to output; its

now 23+45*(13) The eoln is now true, so if no invalid characters have beenencountered.

all the tokens remaining on the stack are popped and added to theoutput;

its now 23+45*/

The psuedocode for the while loop performed when valid is true and noteoln is:

if operand( token ) thenpostfix := postfix + token

else if operator( token ) thenwhile not stack.empty and precedence( stack.top, token ) do

postfix := postfix + stack.pop;if operator( token ) <> ')' then

stack.push( token )else

pop stack and discard itelse valid := false

Algorithm evaluatePostfix(postfix)

Page 24: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 24/90

 

 // Evaluates a postfix expression.

valueStack = a new empty stack

while (postfix has characters left to parse)

{

nextCharacter = next nonblank character of postfix

switch (nextCharacter){

case variable:

valueStack.push(value of the variable nextCharacter)

break 

case '+': case '-': case '*': case '/': case '^':

operandTwo = valueStack.pop()

operandOne = valueStack.pop()

result = the result of the operation in nextCharacter and its operands

operandOne and operandTwo

valueStack.push(result)

break 

default: break 

}

}

return valueStack.peek() 

Recursive Methods

A recursive method making many recursive calls

Places many activation records in the program stack•

Hence recursive methods can use much memory

Possible to replace recursion with iteration by using a stack Using a Stack Instead of Recursionboolean binarySearch(int first, int last, Comparable desiredItem)

{

boolean found;

int mid = (first + last)/2;

if (first > last)

found = false;else if (desiredItem.equals(entry[mid]))

found = true;

else if (desiredItem.compareTo(entry[mid]) < 0)

found = binarySearch(first, mid-1, desiredItem);

else

found = binarySearch(mid+1, last, desiredItem);

return found;

Page 25: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 25/90

 

} // end binarySearch 

UNIT II

LINKED LISTS

Arrays are not efficient in dealing with problems such as:

. Joining two arrays,

. Insert an element at an arbitrary location.

. Delete an element from an arbitrary location

To overcome these problems, another data structure called linked list can beused in programs.

 

Linked list is formed of a set of data items connected by link fields (pointers). So,each node contains: a) an info (data) part, b) a link (pointer) part

* Nodes do not have to follow each other physically in memory

*The linked list ends with a node which has "^" (nil) in the link part,showing that itis the last element of the chain.

Page 26: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 26/90

 

 

Example:

 

The physical ordering of this linked list in the memory may be

 

To join LIST1, LIST2: modify pointer of "C" to point to "T".

 

Page 27: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 27/90

 

 

To insert a new item after B:

1) modify pointer field of NEW to point to C

2) modify pointer field of B to point to NEW

 

To delete an item coming after B,

Page 28: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 28/90

 

 

1) modify pointer field of B, to point to the node pointed by pointer of OLD

2) modify pointer field of OLD as ^ (not to cause problem later on)

 

Some Problems with linked lists can be listed as follows:

1) They take up extra space because of pointer fields.

2)To reach the n'th element, we have to follow the pointers of (n-1) elementssequentially. So, we can't reach the n'th element directly.

For each list, let's use an element "list head" which is simply a pointer pointing atthe first entry of the list:

 

Now, assume we start with the following

memory organization:

 

then, we delete the last element (2)

then, we delete the second element (4)

we add a new element (7)

Page 29: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 29/90

 

we add a new element (8)

Now we are out of memory.

 

For a more efficient organization, we shall keep the deleted entries in a "LIST OFAVAILABLE SPACES" (LAVS in short) and use them when we need newelements.

Considering the previous example with a LAVS;

 

when we delete 2 and 4

we add them into LAVS.

To get an element from LACS.

NEW LAVS (we get new element)

LAVS LINK (LAVS) (reaorganize LAVS)

 

To add a new element (NEW) after node 2:

LINK(2) NEWLINK(NEW) NIL(n)

Insertion of a node in the beginning of a single liked list

Algorithm:-

Insert InsertBegin(Head,item)

Step1. New getnode

Step2. data(new) itemStep3. next(new) Head

Step4. Head Step5. return Head

Function to insert a new node at the beginning of the linked list

Or C program or C code

 Node *insertBegin(Node * Head, int item)

Page 30: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 30/90

 

/* To add an element to the beginning of the list and return the head of the list */

{

 Node * new =(Node *)malloc(sizeof(Node));new - >data=item;

new- >next= Head;

Head=new;return Head;

}

Insertion of a node at the end of a single liked list:-

Following fig shows the step involved in adding a node at the end of the list.Here we

need to traverse the list from the beginning to the end and then add a node.The last

node of the list distinguishes itself from other nodes in that it has Null value in the

next field.This feature is used to locate last node,given the beginig of the list

Algorithm Insertion of a node at the end of a single liked list

Algorithm InsertEnd(Head,Item)Step1- new<- getnode(NODE)

Step2-data(new) item

Step3-next(new) Null

Step4- Head=NULL

Then

Head new

return Head

else

temp Head

Step5-while(next(temp) == NULL)

temp next(temp)

Step6-next(temp) new

Step 7-return Head

C program or C code or

Function to insert a node at the end of a single liked list

Node InsertEnd(Node * Head , int item)

/* To add an element to the end of the list and return the head of the list */

{

 Node *temp;

Node * new =(Node *)malloc(sizeof(Node));

New -> data =item;

New -> next= NULL;

if (Head==NULL)

{

Head=new;

Page 31: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 31/90

 

Return Head;

}

temp=Head;

while(temp->next!=NULL) / * Loop continue reach upto last node */

temp= temp-> next;

temp-> next = new;return Head;

}

Count the number of nodes in a linked list:-

Here we have to traverse the list from the beginning till the end. Every time we

cross a node, a node the counter is incremented

At the end of the algorithm, this count will indicate the number of nodes in the list.Algorithm

Algorithm CountNodes(Head)Step1-count 0

Step2-if(Head=NULL)return count;

Step3-temp Head

Step4-while(Temp== NULL)

a.count count + 1b.temp next(temp)

Step5-return count

Step6-end

C programint countNodes(Node * Head){

 Node *temp;

int count=0;if(Head= = NULL)

return count;

temp=Head;

while(temp!= NULL)count + +;

temp= temp->next;

}return count;

}

Algorithm to print the elements of single linked list

Algorithm PrintElements(Head)

Step1-if(Head=NULL)

Page 32: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 32/90

 

Step2-temp Head

Step3-while(Temp== NULL)

print data(temp)temp next(temp)

Step4- return

Step5-end

C code or Function to print print the elements of single linked list

Void printElements(Node *Head)

{

 Node *temp;

If(Head = = NULL)return;

temp=Head;

while(Temp !=NULL)

{ printf(“%d”,temp->data);

}return;

}

Deletion of a node in single linked list

Algorithm DeleteNode(L,M)

Step1-[If M is in the beginning ]

If L=M

Then L next (L)

Free(M)return L;Step 2 temp=L

Step 3 while next(temp) == M

Temp next(temp)Step 4 next(temp)=next(M)

Step 5-free(m);

6.end

C program or C code or function to delete a node in a single linked list Node *deleteNode(Node *L, Node *M)

{

 Node *temp;if(L= = M)

L=L - >next ;

free(m);return L;

}

Temp=L1;While(temp ->next !=M)

Page 33: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 33/90

 

temp = temp->next;

temp -> next=M->next;

return L;}

 

Execises:

1) Write a function to insert a new item at the end of a list

2) Write a function to find the first occurrence of "key" in a key and delete it

3) All occurences

4) Write a function to reverse the order of a list

5) Convert a linear list to a circular list

6) Convert a circular list to a linear list

7) Implement stack using linear list

8) Implement queue using linear list

Circular Lists

The last node points to the first

 

Whereas returning a whole list to lavs (list of available space) takes O(n)operations with a linear list,

Page 34: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 34/90

 

 

temp=lavs;

lavs=listehead->next

listhead->.next=temp;

listhead=nil;

 

this takes O(1) time

 

Doubly Linked Lists

1) Easy to traverse both ways

2) Easy to delete the node which is pointed at (rather than the one following it, asin the case of simply linked lists)

 

Example: A doubly linked circular list:

 

Page 35: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 35/90

 

 

Conventions:

*HEAD NODE: does not carry data, it simply points to the first node (NEXT) and

the last node (PREV)

* NEXT pointer of the last node & the PREV pointer of the first node point to theHEAD NODE

So empty List LIST HEAD

 

Insertion into a doubly linked list:

 

void dinsert(Node *p, *q)

/*insert node p to the right of node q */

{

p(ρ) next=q(ρ) next;

p(ρ) prev=q;

p(ρ) next(ρ) prev=p;

q(ρ) next =p;

};

Exercises:

Write a procedure Add(Node * p1, *p2) that will add/multiply two polynomialsrepresented by doubly linked lists whose head nodes are pointed by P1 & P2

Other linked list examples

1. consider p(x,y,z)=2xy2z3+3x2yz2+4xy3z+5xy3z+5x2y2

Page 36: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 36/90

 

rewrite so that terms are ordered lexicographically , that is x in ascendingorder, for equal x powers y in a.o., then z in a.o

2. Write a procedure to count the no. of nodes in a one way linked list

3. Search for info 'x' and delete that node if it is found (one way)

4. Reverse the direction of the links in a one way linked circular list

Exercises:

1) Implement all methods of LinkedList

2) To implement a stack using a single way list, write the declaration and push &pop procedures

3) To implement a queue write enter and leave procedures

Unit III Trees

Tree data structures

A tree data structure is a powerful tool for organizing data objects based on keys. It is

equally useful for organizing multiple data objects in terms of hierarchical relationships(think of a ``family tree'', where the children are grouped under their parents in the tree).

Trees are usually drawn pictorially like this (again, think of a ``family tree''), where data

can be placed where the asterisks appear:

*/ \

* */ \ /|\* ** * *

/\ ......

The asterisks represent nodes; the node at the top is the root , the tree's ``starting point.''

The arcs between nodes are called branches. A node that has no branches underneath it is

called a leaf . Real trees grow from their root upwards to the sky, but computer-sciencetrees grow from the root downwards.

Here is an example of a tree of species, from zoology:

Page 37: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 37/90

 

When we examine a non-leaf node, we see that the node has trees growing underneath it,and we say that the node has children subtrees. For example, the root node, ``Animal'',

has two children subtrees.

Tree structures make an excellent alternative to arrays, especially when the data stored

within them is keyed or has internal structure that allows one element to be related to, or ``saved within'' another.

Applications of Trees

1. Trees can hold objects that are sorted by their keys. The nodes are ordered so that

all keys in a node's left subtree are less than the key of the object at the node, andall keys in a node's right subtree are greater than the key of the object at the node.

Here is an example of a tree of records, where each record is stored with its

integer key in a tree node:

2.

Here, the leaves are used as ``end points'' and hold nothing.

We call such a tree an ordered tree or a  search tree. The tree drawn above isordered on the integer keys saved in the nodes.

The advantages of ordered trees over sorted arrays are:

o  both insertions (and retrievals) of objects by key take on the average log2 N

time, where N is the number of objects stored.o the tree naturally grows to hold an arbitrary, unlimited number of objects.

3. Trees can hold objects that are located by keys that are sequences. For example,

we might have some books with these Library of Congress catalog numbers:4. QA76 book15. QA7 book26. Q17 book3

Page 38: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 38/90

 

7. B1 book48. Z4 book5

The books's keys are sequences, and the sequences label the branches of a treethat holds the books:

*|

+-------+----------+B | Q | Z |* * *

1 | 1 / \ A 4 |book4 * * book5

7 | | 7book3 book2

| 6book1

Books can be stored at nodes or leaves, and not all nodes hold a book (e.g., Q1).

This tree is called a  spelling tree, and it has the advantage that the insertion and

retrieval time of an object is related only to the length of the key.

9. A tree can represent a structured object, such as a house that must be explored by

a robot or a human player in an adventure game:10. house's entrance----upper hallway----bedroom---closet---...11. | | |12. | | +-----private bath---...13. | +---study---...14. |15. lower hallway---kitchen---...16. |

17. +---lounge---...

We might imagine a robot entering the house at its entrance, knowing nothingabout what lies inside. The robot's data base looks like this:

house's entrance

Perhaps the robot explores the upper hallway, bedroom, and private bath. Its data base expands with the knowledge learned during the exploration:

house's entrance----upper hallway----bedroom---closet---...|+-----private bath---...

As the robot explores more and more of the house, its database, a tree, grows toinclude the knowledge. A tree structure is useful for holding the knowledge,

 because trees can grow dynamically, spawning branches and subtrees as needed.

A tree like the one above is sometimes called a  search tree. Indeed, the search

trees seen in the earlier lectures on stacks and queues also fit into this category.

Page 39: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 39/90

 

18. Trees are used to represent phrase structure of sentences, which is crucial to

language processing programs. Here is the phrase-structure tree (``parse treee'')

for the Java statements19. int x;20. x = 3 + y;21.

22. STATEMENT SEQUENCE23. / \24. DECLARATION ASSIGNMENT25. / \ / \26. TYPE VARIABLE VARIABLE EXPRESSION27. | | | / | \28. int x x NUMERAL + VARIABLE29. | |30. 3 y

The Java compiler checks the grammatical structure of a Java program by reading

the program's words and attempting to build the program's parse tree. If 

successfully contructed, the parse tree is used as a guide to help the Java compiler generate the byte code that one finds in the program's .class file.

31. An operating system maintains a disk's file system as a tree, where file folders act

as tree nodes:

32.

The tree structure is useful because it easily accommodates the creation anddeletion of folders and files.

The tree forms listed above have varying internal structure, but all are variations on thesame basic idea --- an inductive definition, which we now study in its purest form.

Page 40: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 40/90

 

Binary Trees

We begin study with a form of tree whose nodes have exactly two subtrees. Here is the

inductive definition:

 A BinaryTree object is

1. A Leaf-structure, representing an empty tree; or 

2. A Node-structure, which contains:o an object, called the ``value'' 

o a BinaryTree object, called the ``left subtree'';

o a BinaryTree object, called the ``right subtree'';

Here is a picture of a binary tree, where integers are saved as values at the nodes, and

leaves hold no integers at all:

Sometimes, the arcs that enamate from the nodes are labelled so that we can describe

subtrees; here, each node possess a left subtree and a right subtre..

Since trees are inductively defined, the above tree can be drawn as a layered structure:

Node----------------------------------------------------------------+| 9 || Node--------------+ Node------------------------------+ || | 5 | | 12 | || | Leaf Leaf | | Leaf Node---------------+ | || +-----------------+ | | 15 | | || | | Leaf Leaf | | || | +------------------+ | || +---------------------------------+ |+-------------------------------------------------------------------+

Another representation of the above tree isNode( 9, Node( 5, Leaf(), Leaf() )

Node( 12, Leaf(),Node( 15, Leaf(), Leaf() )

))

This looks more like Java code, and it shows the nested structure of the tree.

Processing a binary tree by recursion

There is a basic pattern for computing on binary trees; the pattern follows the inductive

definition. Here is what it looks like in equational form:

Page 41: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 41/90

 

process( Leaf ) = ...some simple answer...process( Node(val, left, right) ) = ...compute an answer from val,process(left), and process(right)...

Here is an example, which counts the number of Node-objects within a binary tree:countNodes( Leaf ) = 0countNodes( Node(val, left, right) ) = 1 + countNodes(left) +

countNodes(right)The intuition behind the schema is simple: To count all the nodes in a big tree, we splitthe task into pieces:

1. count all the nodes in the slightly smaller, left subtree;

2. count all the nodes in the slightly smaller, right subtree;

3. add together these counts, plus one, for the root.

Here is a picture of the countNodes schema:

You should apply the schema to the above example tree and calculate that it has four 

nodes.

The classes for binary trees

Here is the coding of the binary-tree data structure, based on the inductive definition seen

earlier. First, we use an abstract class to name the data type of binary tree:package BinTree;

/** BinaryTree defines the data type of binary trees:* (i) a leaf, or* (ii) a node that holds a value, a left subtree, and a right

subtree.* Methods listed below should be overridden as needed by subclasses.

*/public abstract class BinaryTree{ /** value returns the value held within a tree node

* @return the value */public Object value(){ throw new RuntimeException("BinaryTree error: no value"); }

/** left returns the left subtree of this tree* @return the left subtree */

public BinaryTree left(){ throw new RuntimeException("BinaryTree error: no left subtree"); }

/** right returns the right subtree of this tree* @return the right subtree */

public BinaryTree right(){ throw new RuntimeException("BinaryTree error: no right subtree"); }

}

First, here is the coding for leaf objects:

Page 42: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 42/90

 

package BinTree;/** Leaf models a tree leaf---an empty tree */public class Leaf extends BinaryTree{ /** Constructor Leaf constructs the empty tree */public Leaf() { }

}

 Next, we write the coding for constructing node objects:package BinTree;/** Node models a nonempty tree node, holding a value and two subtrees*/public class Node extends BinaryTree{ private Object val;private BinaryTree left;private BinaryTree right;

/** Constructor Node constructs the tree node* @param v - the value held in the node* @param l - the left subtree* @param r - the right subtree */

public Node(Object v, BinaryTree l, BinaryTree r)

{ val = v;left = l;right = r;

}

public Object value(){ return val; }

public BinaryTree left(){ return left; }

public BinaryTree right(){ return right; }

}

 Notice that we have not supplied methods to modify a tree structure after the tree is firstconstructed. Surprisingly, these are not crucial --- we can always build new trees from

scratch, as we will see. (These issues are addressed a bit later in this lecture.)

Here is the Java coding of the countNodes function, which was developed in the

 previous subsection:

/** countNodes returns the number of Node-objects within a tree* @param t - the tree analyzed* @return the number of Nodes within t */

public int countNodes(BinaryTree t){ int answer;if ( t instanceof Leaf )

{ answer = 0; }else // t must be a Node:

{ int left_answer = countNodes(t.left());int right_answer = countNodes(t.right());answer = 1 + left_answer + right_answer;

}return answer;

Page 43: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 43/90

 

}

You can find the classes for building binary trees here:

Definition of BinaryTree data type and applications that use it

In the directory, look at the BinTree directory for the basic classes that define binarytrees. The Test.java application shows how to build some trees and use them as

arguments to methods. The TreeCalculator.java class contains codings of methods

that count and print trees.

Printing a tree's contents: in-order, pre-order, and post-order 

We might wish to print the contents of a tree's nodes:

printNodes( Leaf ) = nothing to print

printNodes( Node(value, left, right) ) = printNodes(left);print the value;printNodes(right);

The printing proceeds by printing the left subtree in its entirety, followed by the value at

the root, followed by the right subtree. This is called an in-order tree traversal. In Java,/** printInOrder prints all the values held within a binary tree;

* the tree is traversed left-to-right _in-order_ (value printed inmiddle)

* @param t - the tree traversed and printed */public void printInOrder(BinaryTree t){ if ( t instanceof Leaf )

{ } // no value to printelse // t must be a Node:

{ printInOrder(t.left()); // print nodes in left subtreeSystem.out.println( (t.value()).toString() ); // print

node's valueprintInOrder(t.right()); // print nodes in right subtree

}}

For example, the tree seen earlier would print like this:

591215

Here is a variation on the printing:

print2( Leaf ) = nothing to print

print2( Node(value, left, right) ) = print the value;print2(left);print2(right);

This is called a pre-order traversal and would generate

Page 44: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 44/90

 

951215

for the example tree.

There is a third variation, post-order traversal, which prints a node's value after thecontents of the left and right subtrees are printed. You should formulate the algorithm for 

 post-order traversal.

Heaps

A heap is a complete binary tree that possesses the heap-order property.

A binary tree is complete if 

1. all paths in the tree have length within one of the length of the longest path.

2. all paths that have lengths less than the longest path's length are rightmost within

the tree.

Here are two examples of complete binary trees:a c

/ \ / \b c b a

/ \ / \ / \ / \d e f . d . . ./\ /\ /\ / \. . .. . . . .

 Note that the shortest paths in the trees are the ones on the trees' right. In contrast, these

trees are not complete:

a c/ \ / \b c b a

/ \ / \ / \ / \d e f . . . d ./\ /\ /\ / \. g .. . . . ./\. .

You can consider a complete tree as a tree where insertions must be added in a fixed, left-

to-right order, much like laying bricks:1

/ \

2 3/ \ / \4 5 6 */\ /\ /\+ . .. . .

The leaf marked by the asterisk is the position for the next insertion, the leaf marked by

the plus symbol is the position after that, etc.

Page 45: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 45/90

 

 Next, a complete binary tree has the heap-order property if, for every node, N, within the

tree, the priority number of the value held within node N is less-than-or-equals all priority

numbers held within N.left() and all priority numbers held within N.right().

Here is an example of a complete tree with the heap-order property, where each node

holds a priority number, object pair.

2,r/ \7,m 3,q

/ \ / \9,p 12,k 6,w ./\ /\ /\. . .. . .

Check each node---the number at a node is less-or-equals to all the numbers ``below'' it inthe tree. This means the smallest number must be at the root. 

Although this tree is heap-ordered, it is not an ordered tree (binary search tree) --- notewhere 3 is positioned, say, relative to 2 and 7. This fact will be exploited to devise a fast

algorithm for insertion.

Insertion into a heap

Insertion must add a new priority number, object pair to the heap in such a way that the

resulting structure is still a heap---it is complete and heap-ordered. The algorithm is

simple and clever:

To insert num, ob into heap, h:

1. Place num, ob in a new node that replaces the leftmost leaf nearest the root of  h.(This inserts num, ob into the first leaf encountered when ``reading'' the tree left-

to-right, top-down. See the earlier drawings.)2. Next, make the new node with num, ob ``bubble up'' in the tree by repeatedly

swapping the node with its parent, when the parent's priority number is greater 

than num.

Here is an example: Say that we perform insert(1,s) into the heap drawn above. After 

Step 1 of the algorithm, we have this tree:2,r

/ \

7,m 3,q/ \ / \9,p 12,k 6,w 1,s/\ /\ /\ /\. . .. . . . .

This is a complete tree but is not heap-ordered. To restore the latter property, we must

make 1,s ``bubble up'' to its appropriate position. First, we note that 1 is less than 3, the

 priority number of the new node's parent, so we exchange nodes:2,r

/ \

Page 46: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 46/90

 

7,m 1,s/ \ / \

9,p 12,k 6,w 3,q/\ /\ /\ /\. . .. . . . .

An examination of the new parent to 1,s shows that another exchange is warranted,

giving us: 1,s/ \7,m 2,r

/ \ / \9,p 12,k 6,w 3,q/\ /\ /\ /\. . .. . . . .

At this point, the tree is heap-ordered.

Retrieval from a heap

A retrieval operation always returns the object in the root node and deletes the root of the

heap-ordered tree. But this leaves two trees, which must be rebuilt into one heap. Here isthe algorithm for retrieval:

If the heap is empty, this is an error. Otherwise:

1. Extract the object in the root and save it; call it ob_root.

2. Move to the root the rightmost node that is furthest from the original root. (This

rebuilds the tree so that it is again complete.)3. Say that num, ob now reside at the root. Make this node ``bubble down'' the tree

 by repeatedly swapping it with a child whose priority number is less than num. If 

 both children have priority numbers that are less than num, then swap the node

with the smaller-valued child.4. Return ob_root as the result.

Here is an example: Given the tree just drawn, say that a retrieval must be done. Step 1says that we extract object s and save it. Step 2 says to replace the root with the node that

is furthest and rightmost from the root. This gives us:3,q

/ \7,m 2,r

/ \ / \9,p 12,k 6,w ./\ /\ /\

. . .. . .Step 3 says that 3,q must be exchanged with its children, as necessary, to restore heap-

ordering. Here, 3,q is exchanged with 2,r, giving us:2,r

/ \7,m 3,q

/ \ / \9,p 12,k 6,w ./\ /\ /\. . .. . .

Page 47: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 47/90

 

Only one exchange is needed. s is returned.

Time complexity

The insertion and retrieval operations each take time of the order, log2N, where the heap

contains N nodes. This is because only one path of the heap is traversed during the

``bubble up'' and ``bubble down'' operations.

The challenging aspect of implementing a heap structure lies in remembering which node

is the deepest, rightmost, and which leaf is the shallowest, leftmost. Both of these notions

are tied to the count of objects held in the heap: If we number the positions of the heap in binary numbering,

then the shallowest, leftmost leaf is the position that is one plus the count of objects in theheap, and the deepest, rightmost node is located at the position that is exactly the same as

the count of objects in the heap.

Since each binary numeral indicates a path from the root of the heap to the position

numbered by it,

we can easily locate the leftmost leaf and rightmost node with simple binary arithmetic,

where the calculation of the binary numeral and the path traversal based on the numeral

 both take on the order of log2N time.Copyright © 2001 David Schmidt  

Modifying a tree---building a new tree in terms of an existing one

Of course, it is always easy to build a new tree using smaller trees that we have already

constructed. For example, say that we have this tree:BinaryTree right_subtree = new Node("c", new Leaf(), new Leaf());

It is easy to use right_subtree to build a larger tree:BinaryTree t = new Node("a", new Leaf(), right_subtree);

Page 48: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 48/90

 

Tree t holds right_subtree as its right subtree.

But the situation get more interesting if we wish to add new ``growth'' in place of one of 

the leaves already embedded within tree t For example, say that we wish to revise tree t

so that its left subtree, a leaf, is replaced by a node that holds "b".

First approach: mutable trees

It is tempting to wish for a method, say, setLeft, so that we might just say:

BinaryTree new_left_subtree = new Node("b", new Leaf(), new Leaf());t.setLeft(new_left_subtree);

Of course, we can indeed write a setLeft method (and for that matter, a setRight

method) and add them to class Node:

public class Node extends BinaryTree{ private Object val;private BinaryTree left;private BinaryTree right;...

public void setLeft(BinaryTree new_left){ left = new_left; }

public void setRight(BinaryTree new_right){ right = new_right; }

}

But before we seize on this approach exclusively, we should consider another, clever way

of replacing a subpart of an existing tree.

Second approach: immutable trees

Once again, here is tree t:

BinaryTree right_subtree = new Node("c", new Leaf(), new Leaf());BinaryTree t = new Node("a", new Leaf(), right_subtree);

 Now, we build the new_left_subtree, just like before:BinaryTree new_left_subtree = new Node("b", new Leaf(), new Leaf());

 Now we wish to alter t so that its left leaf is replaced by new_left_subtree. But we do

not use setLeft to alter  t. Instead, we rebuild the parts of  t that rest above the

new_left_subtree and reuse the other parts:t = new Node(t.value(), new_left_subtree, t.right());

This assignment statement assigns a new tree to variable t --- the tree's root value is

exactly the root value that t previously held. Also, the tree's right subtree is exactly the

same subtree that t previously held.  But t's old left subtree, the leaf, is forgotten and 

new_left_subtree is used instead. 

There is no need for a setLeft method --- we build a new tree instead.

Page 49: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 49/90

 

Comparison

The first approach, which used a setLeft method, employs mutable trees; the second

method, which rebuilds the parts of the tree that rest above the altered part, usesimmutable trees.

What are the comparative aspects of the two approaches? For mutable trees:

• The program maintains only one (big) tree, because the mutation operations,setLeft and setRight, change links in the heap.

• For some activities, e.g., tree balancing (where a new root node must be installed),

the mutations can be done quickly with programming ``tricks'' that cleverly reset a

few links.

Here are the important aspects of immutable trees:

• A program can maintain multiple trees, and the trees can share each other's substructures. (This happens in text editing programs and other interactive

 programs: the state of the edited document is stored internally as a tree, and when

you make a modification, a new tree is built that shares almost all of the old tree but also contains the modification you made. If you press the ``undo'' button to

undo the modification, the ``current'' tree is forgotten and the ``earlier'' tree is used

instead. The ability to repeatedly undo is made possible by multiple trees thatshare huge parts of each other.)

• In Java, there is no penalty to constantly building new trees from pieces of other 

trees --- the Java garbage collector reclaims those pieces of trees that are

discarded and are unreferenced by the program.

We will study how to build both mutable and immutable trees.

Page 50: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 50/90

 

Unit VI

Hash Tables

• An extremely effective and practical way of implementing dictionaries.

• O(1) time for search, insert, and delete in the average case.

• O(n) time in the worst case; by careful design, we can make the probability thatmore than constant time is required to be arbitrarily small.

• Hash Tables

o Open or External

o Closed or Internal

Open Hashing 

Let:

• U be the universe of keys:o integers

o character strings

o complex bit patterns

• B the set of hash values (also called the buckets or bins). Let  B = {0, 1,..., m -1}where m > 0 is a positive integer.

A hash function h : U B associates buckets (hash values) to keys.

Two main issues:

1.

Collisions 

If x1 and x2 are two different keys, it is possible that h(x1) = h(x2). This is called acollision. Collision resolution is the most important issue in hash table

implementations.

Page 51: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 51/90

 

2.

Hash Functions 

Choosing a hash function that minimizes the number of collisions and also hashesuniformly is another critical issue.

Collision Resolution by Chaining 

• Put all the elements that hash to the same value in a linked list. See Figure 3.1.

Figure 3.1: Collision resolution by chaining

Example: 

Page 52: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 52/90

 

See Figure 3.2. Consider the keys 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100. Let the hash

function be:

h(x) = x % 7

Figure 3.2: Open hashing: An example

• Bucket listso unsorted lists

o sorted lists (these are better)

• Insert (x, T)

Insert x at the head of list T[h(key (x))]

• Search (x, T)

Search for an element x in the list T[h(key (x))]

• Delete (x, T)

Page 53: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 53/90

 

Delete x from the list T[h(key (x))]

Worst case complexity of all these operations is O(n)

In the average case, the running time is O(1 + ), where

=

load factor 

(3.1)

n = number of elements stored (3.2)m = number of hash values or buckets

It is assumed that the hash value h(k) can be computed in O(1) time. If n is O(m),

the average case complexity of these op3.4 Closed Hashing

• All elements are stored in the hash table itself 

• Avoids pointers; only computes the sequence of slots to be examined.

• Collisions are handled by generating a sequence of rehash values.

h : x {0, 1,2,..., m - 1}

• Given a key x, it has a hash value h(x,0) and a set of rehash values

h(x, 1), h(x,2), . . . , h(x, m-1)

• We require that for every key x, the probe sequence

< h(x,0), h(x, 1), h(x,2), . . . , h(x, m-1)>

 be a permutation of <0, 1, ..., m-1>.

This ensures that every hash table position is eventually considered as a slot for 

storing a record with a key value x.

Search (x, T) 

• Search will continue until you find the element x (successful search) or an empty

slot (unsuccessful search).

Page 54: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 54/90

 

Delete (x, T) 

•  No delete if the search is unsuccessful.

• If the search is successful, then put the label DELETED (different from an emptyslot).

Insert (x, T) 

•  No need to insert if the search is successful.

• If the search is unsuccessful, insert at the first position with a DELETED tag.

erations becomes O(1) !

Rehashing Methods

Denote h(x, 0) by simply h(x).

1.

Linear probingh( x, i) = (h( x) + i) mod  m 

2.

Quadratic Probingh( x, i) = (h( x) + C 1i + C 2i

2) mod  m 

where C 1 and C 2 are constants.

3.

Double Hashing

h( x, i) = (h( x) + i   mod   m 

A Comparison of Rehashing Methods 

Page 55: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 55/90

 

m distinct probe Primary clustering

sequences

m distinct probe No primary clustering;

sequences but secondary clustering

 

m2 distinct probe No primary clustering

sequences No secondary clustering

  An Example:

Assume linear probing with the following hashing and rehashing functions:

h( x, 0) = x%7h( x, i) = (h( x, 0) + i)%7

Start with an empty table.

Insert (20, T) 0 14

Insert (30, T) 1 empty

Insert (9, T) 2 30

Insert (45, T) 3 9

Insert (14, T) 4 45

5 empty

6 20

 

Search (35, T) 0 14

Delete (9, T) 1 empty

2 30

3 deleted4 45

5 empty

6 20

 

Page 56: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 56/90

 

Search (45, T) 0 14

Search (52, T) 1 empty

Search (9, T) 2 30

Insert (45, T) 3 10Insert (10, T) 4 45

5 empty

6 20

 

Delete (45, T) 0 14

Insert (16, T) 1 empty

2 30

3 10

4 16

5 empty

6 20

 Another Example:

Let m be the number of slots.

Assume :every even numbered slot occupied and every odd

numbered slot empty

  any hash value between 0 . . . m-1 is equally likely

to be generated.

  linear probing

 

empty

occupied

empty

occupied

empty

occupied

Page 57: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 57/90

 

empty

occupied

Expected number of probes for a successful search = 1

Expected number of probes for an unsuccessful search

=

(1) + (2) 

= 1.5

Hashing Functions

What is a good hash function?

• Should satisfy the simple uniform hashing property.

Let U = universe of keys

Let the hash values be 0, 1, . . . , m-1

Let us assume that each key is drawn independently from U according to a

  probability distribution P. i.e., for k U P (k ) =

Probability that k is drawn

Then simple uniform hashing requires that

 P (k ) = for each  j = 0, 1,..., m - 1

that is, each bucket is equally likely to be occupied.

• Example of a hash function that satisfies simple uniform hashing property:

Page 58: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 58/90

 

Suppose the keys are known to be random real numbers k  independently and

uniformly distributed in the range [0,1).

h(k ) = km

satisfies the simple uniform hashing property.

Qualitative information about  P  is often useful in the design process. For 

example, consider a compiler's symbol table in which the keys are arbitrary

character strings representing identifiers in a program. It is common for closelyrelated symbols, say pt, pts, ptt, to appear in the same program. A good hash

function would minimize the chance that such variants hash to the same slot.

• A common approach is to derive a hash value in a way that is expected to beindependent of any patterns that might exist in the data.

o The division method computes the hash value as the remainder when the

key is divided by a prime number. Unless that prime is somehow related to patterns in the distribution P , this method gives good results.

Division Method 

• A key is mapped into one of m slots using the function

h(k ) = k  mod  m 

• Requires only a single division, hence fast• m should not be :

o a power of 2, since if m = 2 p, then h(k ) is just the p lowest order bits of k  

o a power of 10, since then the hash function does not depend on all the

decimal digits of k  o 2 p - 1. If k is a character string interpreted in radix 2 p, two strings that are

identical except for a transposition of two adjacent characters will hash to

the same value.

• Good values for m 

o  primes not too close to exact powers of 2.

Multiplication Method 

There are two steps:1.

Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional

 part of kA 

Page 59: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 59/90

 

2.

Multiply this fractional part by m and take the floor.

h(k ) = m(kA  mod  1)

where

kA mod 1 = kA - kA

 h(k ) = m(kA - kA )

 

• Advantage of the method is that the value of m is not critical. We typically choose

it to be a power of 2:

m = 2 p 

for some integer  p so that we can then easily implement the function on mostcomputers as follows:

Suppose the word size = w. Assume that k fits into a single word. First multiply k 

 by the w-bit integer   A.2w . The result is a 2w - bit value

r 12w + r 0

where r 1 is the high order word of the product and r 0 is the low order word of the

 product. The desired p-bit hash value consists of the p most significant bits of r 0.

• Works practically with any value of  A, but works better with some values than the

others. The optimal choice depends on the characteristics of the data beinghashed. Knuth recommends

 A = 0.6180339887... (Golden  Ratio)

Universal Hashing 

This involves choosing a hash function randomly in a way that is independent of the keys

that are actually going to be stored. We select the hash function at random from a

carefully designed class of functions.

Page 60: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 60/90

 

• Let be a finite collection of hash functions that map a given universe U of 

keys into the range {0, 1, 2,..., m - 1}.

• is called universal if for each pair of distinct keys  x,  y  U , the number of 

hash functions h for which h( x) = h( y) is precisely equal to

• With a function randomly chosen from , the chance of a collision between  x

and y where x   y is exactly .

Example of a universal class of hash functions:

Let table size m be prime. Decompose a key  x into r +1 bytes. (i.e., characters or fixed-

width binary strings). Thus

 x = ( x0, x1,..., xr )Assume that the maximum value of a byte to be less than m.

Let a = (a0, a1,..., ar ) denote a sequence of r + 1 elements chosen randomly from the set

{0, 1,..., m - 1}. Define a hash function ha by

ha( x) = ai xi mod m 

With this definition, = {ha} can be shown to be universal. Note that it has mr + 1

members.

Analysis of Closed Hashing

Page 61: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 61/90

 

Load factor 

=

Assume uniform hashing. In this scheme, the probe sequence

< h(k , 0),..., h(k , m - 1) >

for each key k  is equally likely to be any permutation of < 0, 1,..., m - 1 >

Page 62: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 62/90

 

Unit v

Graphs

A road map is an example of a  graph. So is a circuit diagram, a hardware diagram, aflowchart, a UML class diagram, a work-flow chart, and a structure diagram of a

database system. Graphs are common, and it is important to know how to model them

and process them within a computer.

Definition of a graph

Precisely stated, a (directed) graph is

1. a set, N , of nodes 

2. a set, E , of pairs of elements from N , called edges.

For example, this picture of part of the flight plan of American Airlines,

can be written precisely as the graph,N = { KC, Chi, Dal, NYC }E = { (Dal,KC), (KC,Chi), (Chi,KC), (Chi,Dal), (Dal,Chi), (Chi,NYC)}

Each edge has a source and a target (In the example, there is an edge whose source is Chiand whose destination is NYC, but there is no edge whose source is NYC. It is legal to

have an edge whose source and destination are the same node.)

A  path from a node A to node B is a sequence of zero or more edges that start at A,

connect together, and end at B. In the example, there is a path from Dal to NYC but thereis no path from NYC to Dal.

 Notice that a tree is a special form of graph, where one node is called the root , and there

is exactly one path from the root to all the other nodes. Unlike a tree, a graph can have

multiple paths between nodes, and a graph can have cycles (loops), where a path can gofrom a starting node to other nodes and back to the starting node again.

Another form of graph is an undirected graph, where every edge from a node A to nodeB is read also as an edge from B to A. We draw an undirected graph without arrowheads

on the edges, like this:

Page 63: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 63/90

 

A graph is a weighted graph if some form of label or value (called a ``weight'') is

attached to each of its edges. For example, mileages might be attached to the directed

graph seen first,

The formal representation displays each edge as a triple:N = { KC, Chi, Dal, NYC }E = { (Dal,600,KC), (KC,500,Chi), (Chi,500,KC), (Chi,800,Dal),(Dal,800,Chi), (Chi,900,NYC)}

Representing graphs in a computer

There are two standard ways of storing graphs in a computer: the adjacency matrix and

the adjacency list .

An adjacency matrix is a kind of ``mileage table'':

Chi Dal KC NYC+-----------------

Chi | 0 800 500 900|

Dal |800 0 600 -1|

KC |500 -1 0 -1|

NYC | -1 -1 -1 0

Here, the weights indicate the existence of an edge, and -1 (and 0!) means there is noedge.

The adjacency list organizes the graph's edges into sets, based on the edges' sources. Here

is the adjacency list for the example:

Page 64: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 64/90

 

As a rule, use an adjacency matrix to store a graph where there are edges between almost

all the nodes; use an adjacency list when the graph has few edges, that is, it is  sparse, that

is, there are fewer than N * log 2 N edges, for a graph with N nodes.

The adjacency matrix and adjacency list are ``raw'' forms of graph and are not oriented

towards solving any particular problem. The forms of problems that one must solve aretypically:

•  process/print the nodes, e.g., check if the graph is connected  --- for every node,one can go to all other nodes

•  process/print the edges, e.g., find the shortest, weighted path from one node to

another 

The best way to solve such problems is to extract information from the adjacencymatrix/list and construct an appropriate tree. There are two important trees built from

graphs: spanning trees and regular trees.

Spanning trees

A spanning tree is a tree that lists all the nodes in a graph but not necessarily all the

edges. Its root is the ``entry node'' into the graph. Here are spanning trees that we cangenerate from the example directed graph where we use Dal as the entry node. For 

simplicity, we omit the weights on the edges, since spanning trees are normally used to

answer questions about the nodes.

and here is a spanning tree for entry node, NYC:

Spanning trees are good for answering questions/do processing of the nodes in a graph,

where the edges are unimportant to the answer. For example, we can use a spanning tree

to answer: Is there a path from one node to another? Is one node connected to all others?

Minimum Spanning Tree Algorithms

Definition:-

A tree is a connected graph without cycles.

Page 65: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 65/90

 

Properties of Trees

° A graph is a tree if and only if there is one and only one path joining any two of its

vertices.

° A connected graph is a tree if and only if every one of its edges is a bridge.

° A connected graph is a tree if and only if it has N vertices and N; 1 edges.

Definitions:- 

° A subgraph that spans (reaches out to ) all vertices of a graph is called a spanning

subgraph.

° A subgraph that is a tree and that spans (reaches out to ) all vertices of the originalgraph is called a spanning tree.

° Among all the spanning trees of a weighted and connected graph, the one (possibly

more) with the least total weight is called a minimum spanning tree (MST).

Kruskal's Algorithm

• Step 1 

Find the cheapest edge in the graph (if there is more than one, pick one atrandom). Mark it with any given colour, say red.

• Step 2 

Find the cheapest unmarked (uncoloured) edge in the graph that doesn't close acoloured or red circuit. Mark this edge red.

• Step 3 

Repeat Step 2 until you reach out to every vertex of the graph (or you have N ; 1

coloured edges, where N is the number of Vertices.) The red edges form the

desired minimum spanning tree.

o Kruskal Step by Step

o Tutorial Kruskal

o Interactive Kruskal's Algorithm

Prim's Algorithm

Page 66: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 66/90

 

• Step 0 

Pick any vertex as a starting vertex. (Call it S). Mark it with any given colour, say

red.

• Step 1 

Find the nearest neighbour of S (call it P1). Mark both P1 and the edge SP1 red.

cheapest unmarked (uncoloured) edge in the graph that doesn't close a coloured

circuit. Mark this edge with same colour of Step 1.

• Step 2 

Find the nearest uncoloured neighbour to the red subgraph (i.e., the closest vertexto any red vertex). Mark it and the edge connecting the vertex to the red subgraph

in red.

• Step 3 

Repeat Step 2 until all vertices are marked red. The red subgraph is a minimum

spanning tree.

• Interactive Prim's Algorithm

Regular trees

A regular tree is a tree that generates all the paths that can be taken starting from an entrynode, where along every path, no node repeats. Here is the regular tree that generates all

 paths from entry node, KC:

The point is, when we reach a node that repeats in a path, it is the same as a backwards

arc --- a loop:

Page 67: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 67/90

 

Here is the regular tree that generates all paths from entry node, Chi:

Regular trees are built to answer questions/do processing of the paths in a graph, wherethe edges and weights are important to the answer. For example, we can use a regular tree

to answer: list all the paths from one node to another; find the shortest (weighted) path

from one node to another.

We now examine two standard and classic graph problems and show how to use the treesto solve them.

Application: modelling sets by spanning trees --- UNION-FIND

For simplicity, we work with undirected graphs in this section.

A collection of graph nodes is connected  if for every node in the collection, there is a path to every other node in the collection. For this (undirected) graph:

There are two connected collections: {Tokyo, Osaka, Seoul} and {KC, Dal, Chi,

NYC}.

The collections are disjoint ``sets,'' and such sets play a crucial role in answering

questions of the form: ``Is there a path from node A to node B?'' or ``Do A and B belongto the same set?'' To answer the question, we check if A and B are in the same connected

collection.

To answer such questions, it is useful to maintain spanning trees that include all the nodes

of the graph. For the example, two spanning trees are required, one for each ``set'':

 Now, to answer the question, ``can we go from A to B?'', we check if nodes A and B havethe same root in the spanning tree(s) where they are saved. For example, we can check 

``Can we go from Dal to NYC?'' by finding Dal and NYC and confirming that they share

the same root, KC.

Page 68: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 68/90

 

Another view is that each set is named by the root of its spanning tree --- in the example,

we have the Tokyo set and the KC set. Then, the questions just asked are set-membership

questions: ``Is Dal in the KC-set''?

The previous checks of ``connectivity'' take time O(log 2 N), where  N  is the number of 

nodes in the graph (assuming that we can locate the source and destination nodes in fixedtime, which we can do if we save the addresses of all the spanning-tree nodes in a hash

table or spelling tree).

The operation is called a FIND operation. There is also a UNION operation, which

corresponds to unioning together two sets:

Say that flights are added between Chi and Osaka, that is, an edge has been added

 between these two nodes. How can we efficiently combine the spanning trees to reflectthe new graph edge?

There is a simple solution to updating the spanning trees: draw a line between Chi's rootand Osaka's:

This does a ``union'' of the two ``sets'' and is called a UNION operation.

One drawback of the simple definition of UNION is that it can create unbalanced

spanning trees --- the above tree is not height-balanced. One way to improve UNION is

to link the spanning tree with fewer nodes to the one with more nodes. For the aboveexample, this would cause Tokyo's tree to link to KC's, which gives a slightly better 

 balance.

But we can do better: the FIND operation is often improved so that it rebalances a

spanning tree after it finds a path from a node in the spanning tree to its root --- each timea FIND operation is performed, all the nodes traversed on the path to the root are attached

directly to the root. For the above example, a FIND operation on NYC would cause both

 NYC and Chi to be relinked directly to the root, Tokyo:

Tokyo/ / | \ \

Seoul Osaka KC Chi NYC|Dal

Application: finding the shortest path --- Dijkstra's algorithm

Consider again this directed, weighted graph:

Page 69: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 69/90

 

A classic question that we might ask is: What is the shortest path (in terms of the

weights) from one node to another? For example, what is the shortest path from Dal to NYC?

A naive solution would enumerate all possible paths from Dal to NYC, say by using a

stack and depth-first search, starting from ``root node'' Dal. But there is a smarter, faster 

solution, due to Dijkstra, which constructs a regular tree that enumerates only the ``short''

 paths.

The key idea behind Dijkstra's algorithm is: generate a regular tree whose root is the start

node; generate all possible paths from the root, but stop a path if either 

1. the next node in the path is one that has already appeared in the path (this signifiesthe path has entered a loop and is not the shortest solution; it is the usual criterionfor stopping a path in a regular tree)

2. the next node is the destination node (and so we calculate the weight of the path

from the root to the destination and see if it is the least we have seen so far)3. the next node is a node that we have already reached in some other path in the

tree and the weight of the other path is already less-than-or-equal-to the weight of 

the path we are just now constructing --- this means the path we are just now

constructing cannot be the shortest path

To help calculate the weights of the paths, we will annotate each node of the regular tree

with the weight of the path from the root to that node. To remember each shortest path sofar from the root to any given node, we will keep an array to remember the weights.

Here is an example --- say we want the shortest path from Dal to NYC in the graph justseen. We begin with this this array and initial regular tree:

Dal Chi KC NYC+----+----+----+----+

shortest: | 0| ?| ?| ?|+----+----+----+----+

The array notes that the shortest path from Dal to Dal is 0; the other distances areundetermined at this time.

The edges from Dal generate this slightly larger regular tree and update the array to read

Dal Chi KC NYC+----+----+----+----+

shortest: | 0| 800| 600| ?|+----+----+----+----+

Page 70: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 70/90

 

 Now, we must develop the tree at the nodes KC and Chi. When we try to develop KC, wefind an edge that leads to Chi:

But when we calculate the weight of the path from Dal to KC to Chi, we find that it is

already larger than the weight already stored in the array as the shortest known weight

from Dal to Chi. This means there is no need to develop this path further, and we stop:

When we develop Chi, we find three edges: the first, to KC is to a node that has been

seen before. The array tells us that the path found earlier to KC had weight 600, so thecurrent path, which has weight 800+500 = 1300, is not least and can be terminated. The

next edge leads to to Dal and a repeat of a node already seen on this path, so there is no

way that this path can be shortest; we terminate this path. The third edge goes from Chi to NYC, and we update the array with our discovery:

Dal Chi KC NYC+----+----+----+----+

shortest: | 0| 800| 600|1700|+----+----+----+----+

The path from Dal to NYC is complete, and we stop this path.

At this point, all paths in the regular tree are completely developed, and we quit --- the

shortest path goes from Dal to Chi to NYC.

Finally, we note that, in the previous example, the regular tree itself need not be explicitlyconstructed in computer storage. Instead, we keep a second array that remembers the

edges that yielded the shortest path. Here is the example repeated, with the two arrays,

one that remembers the weights and one that remembers the individual edges in theshortest path that we build:

Dal Chi KC NYC+----+----+----+----+

Page 71: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 71/90

 

shortest: | 0| ?| ?| ?|+----+----+----+----+

Dal Chi KC NYC+----+----+----+----+

previous:|null| ?| ?| ?|+----+----+----+----+

After the moves from Dal:Dal Chi KC NYC

+----+----+----+----+shortest: | 0| 800| 600| ?|

+----+----+----+----+

Dal Chi KC NYC+----+----+----+----+

previous: |null| Dal| Dal| ?|+----+----+----+----+

For city C, previous[C] remembers the immediate predecessor node whose edge

delivers us to city C on the shortest path. For example, previous[KC] tells us that the

shortest known path from Dal to KC ends with an edge from Dal to KC, andshortest[KC] tells us that the weight of the shortest path from Dal to KC is 600.

After the attempted moves from KC, there are no change to the arrays. The move from

Chi to NYC gives the answer and the last edge in the shortest path:

Dal Chi KC NYC+----+----+----+----+

shortest: | 0| 800| 600|1700|+----+----+----+----+

Dal Chi KC NYC

+----+----+----+----+previous: |null| Dal| Dal| Chi|+----+----+----+----+

 Now, we see that the shortest path from Dal to NYC must end with the edge fromprevious[NYC] to NYC, that is, from Chi to NYC. And, we see that the shortest path

from Dal to Chi must end with the edge, previous[Chi] to Chi.

Indeed, we can assemble the path from the edges, listed in reverse: NYC-Chi-Dal (null).

8.7.4 Searching

Once an array is sorted, it becomes simpler to locate an element within it---rather thanexamining items one by one, from left to right, we can start searching in the middle, at

approximately where the item might appear in the sorted collection. (This is what we do

when we search for a word in a dictionary.) A standard searching algorithm, calledbinary search, exploits this idea.

Page 72: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 72/90

 

Given a sorted array of integers, r, we wish to determine where a value, item, lives in r.

We start searching in the middle of  r; if item is not exactly the middle element, we

compare what we found to it: If item is less than the middle element, then we next search

the lower half of the array; if item is greater than the element, we search the upper half of 

the array. We repeat this strategy until item is found or the range of search narrows to

nothing, which means that item is not present.

The algorithm goes

Set searching = true.Set the lower bound of the search to be 0 and the upper bound of the

search to be the last index of array, r.while ( searching && lower bound <= upper bound )

{ index = (lower bound + upper bound) / 2;if ( item == r[index] ) { found the item---set searching =

false; }else if ( item < r[index] ) { reset upper bound = index-1; }else { reset lower bound = index+1; }

}

Figure 17 shows the method, which is a standard example of the searching pattern of iteration.FIGURE 17: binarysearch=================================================

/** binarySearch searches for an item in a sorted array* @param r - the array to be searched* @param item - the desired item in array r* @return the index where item resides in r; if item is not* found, then return -1 */

public int binarySearch(int[] r, int item){ int lower = 0;

int upper = r.length - 1;int index = -1;boolean searching = true;while ( searching && lower <= upper )

// (1) searching == true implies item is in ranger[lower]..r[upper],

// if it exists in r at all.// (2) searching == false implies that r[index] == item.{ index = (lower + upper) / 2;if ( r[index] == item )

{ searching = false; }else if ( r[index] < item )

{ lower = index + 1; }

else { upper = index - 1; }}if ( searching )

{ index = -1; } // implies lower > upper, hence item not inrreturn index;

}

ENDFIGURE================================================================

Page 73: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 73/90

 

If we searched for the item 10 in the sorted array r seen in the examples in the previous

section, the first iteration of the loop in binarySearch gives us this configuration:

The search starts exactly in the middle, and the loop examines r[2] to see if it is 10. It is

not, and since 10 is larger than 8, the value found at r[2], the search is revised as

follows:

Searching the upper half of the array, which is just two elements, moves the search tor[3], which locates the desired item.

 Notice that a linear search, that is,

int index = 0;boolean searching = true;while ( searching && index != r.length )

{ if ( r[index] == item ){ searching = false; }

else { index = index + 1; }}

would examine four elements of the array to locate element 10. The binary search

examined just two. Binary search's speedup for larger arrays is enormous and is discussed

in the next section.

Binary search is a well-known programming challenge because it is easy to formulateincorrect versions. (Although the loop in Figure 17 is small, its invariant suggests that a

lot of thought is embedded within it.) Also, small adjustments lead to fascinatingvariations. Here is a clever reformulation, due to N. Wirth:

int binarySearch(int[] r, int item){ int lower = 0;int upper = r.length-1;int index = -1;while ( lower <= upper )

// (1) lower != upper+2 implies that item is in range// r[lower]..r[upper], if it exists in r at all

// (2) lower == upper+2 implies that r[index] == item{ index = (lower + upper) / 2;if ( item <= r[index] )

{ upper = index - 1; };if ( item >= r[index] )

{ lower = index + 1; };}

if ( lower != upper+2 ){ index = -1; }

return index;

Page 74: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 74/90

 

}

This algorithm merges variable searching in Figure 17 with the lower and upper bounds

of the search so that the loop's test becomes simpler. This alters the loop invariant so that

the discovery of  item is indicated by lower == upper+2. Both searching algorithms

must terminate, because the expression, upper-lower decreases in value at each

iteration, ensuring that the loop test will eventually go false.

Exercises

1. Use the binary search method in Figure 17 on the sorted array, {1, 2, 2, 4,

6}: Ask the method to search for 6; for 2; for 3. Write execution traces for these

searches.

2. Here is a binary search method due to R. Howell:3. public int search(int[] r, int item)4. { int answer = -1;5. if ( r.length > 0 )6. { int lower = 0;

7. int upper = r.length;8. while ( upper - lower > 1 )9. // item is in r[lower]..r[upper-1], if it is in r10. { int index = (lower + upper) / 2;11. if ( r[index] > item )12. { upper = index; }13. else { lower = index; }14. }15. if ( r[lower]== item ) { answer = lower; }16. }17. return answer;18. }

Explain why the invariant and the termination of the loop ensure that the methodreturns a correct answer. Explain why the loop must terminate. (This is not trivial because the loop makes one extra iter 

ation before it quits.)

8.7.5 Time-Complexity Measures

The previous section stated that binary search computes its answer far faster than doeslinear search. We can state how much faster by doing a form of counting analysis on the

respective algorithms. The analysis will introduce us to a standard method for computing

the time complexity of an algorithm. We then apply the method to analyze the timecomplexity of selection sort and insertion sort.

To analyze a searching algorithm, one counts the number of elements the algorithm must

examine to find an item (or to report failure). Consider linear search: If array r has, say, N

Page 75: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 75/90

 

elements, we know in the very worst case that a linear search must examine all N

elements to find the desired item or report failure. Of course, over many randomly

generated test cases, the number of elements examined will average to about N/2, but in

any case, the number of examinations is directly proportional to the the array' length, and

we say that the algorithm has performance of  order N  (also known as linear ) time

complexity.

For example, a linear search of an array of 256 elements will require at most 256

examinations and 128 examinations on the average.

Because it halves its range of search at each element examination, binary search does

significantly better than linear time complexity: For example, a worst case binary searchof a 256-element array makes one examination in the middle of the 256 elements, then

one examination in the middle of the remaining 128 elements, then one examination in

the middle of the remaining 64 elements, and so on---a maximum of only 9 examinations

are required!

We can state this behavior more precisely with a recursive definition. Let E(N) stand for 

the number of examinations binary search makes (in worst case) to find an item in an

array of N elements.

Here is the exact number of examinations binary search does:

E(N) = 1 + E(N/2), for N > 1E(1) = 1

The first equation states that a search of an array with multiple elements requires anexamination of the array's middle element, and assuming the desired item is not found in

the middle, a subsequent search of an array of half the length. An array of length 1

requires just one examination to terminate the search.

To simplify our analysis of the above equations, say the array's length is a power of 2,that is, N = 2M, for some positive M. (For example, for N = 256, M is 8. Of course, not all

arrays have a length that is exactly a power of 2, but we can always pretend that an array

is ``padded'' with extra elements to make its length a power of 2.)

Here are the equations again:

E(2M) = 1 + E(2M-1), for M > 0E(20) = 1

After several calculations with this definition (and a proof by induction---see theExercises), we can convince ourselves thatE(2M) = M + 1

a remarkably small answer!

We say that the binary search algorithm has order log N  (or  logarithmic) time

complexity. (Recall that log N , or more precisely, log 2 N , is N's base-2 logarithm, that is,

the exponent, M, such that 2M equals  N . For example, log 256  is 8, and log 100 falls

Page 76: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 76/90

 

 between 6 and 7.) Because we started our analysis with the assumption that N = 2M, we

conclude that

E(N) = (log N) + 1

which shows that binary search has logarithmic time complexity.

It takes only a little experimentation to see, for large values of  N, that log N is

significantly less than N itself. This is reflected in the speed of execution of binary search,

which behaves significantly better than linear search for large-sized arrays.

UNIT VI

SORTING

We need to do sorting for the following reasons :

a) By keeping a data file sorted, we can do binary search on it.

 b) Doing certain operations, like matching data in two different files, become much

faster.

There are various methods for sorting, having different average and worst case

 behaviours:

Page 77: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 77/90

 

 

Average Worst

Bubble sort O(n2) O(n2)

Insertion sort O(n2) O(n2)

Shell sort O(n(logn)2  

Quick sort O(nlogn) O(n2)

Heap sort O(nlogn)  

Merge sort O(nlogn) O(nlogn)

 

 The average and worst case behaviours are given for a file having nelements (records).

1. Insertion Sort

Basic Idea:

Insert a record R into a sequence of ordered records: R1,R2,.., Ri with

keys K1 <= K2 <= ... <= Ki , such that, the resulting sequence of sizei+1 is also ordered with respect to key values.

Algorithm Insertion_Sort; (* Assume Ro has Ko = -maxint *)

void InsertionSort( Item &list[])

{ // Insertion_Sort

Item r;

int i,j;

list[0].key = -maxint;

for (j=2; j<=n; j++)

{r=list[j];

i=j-1;

Page 78: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 78/90

 

while ( r.key < list[i].key )

{// move greater entries to the right

list[i+1]:=list[i];

i:=i-1;

};

list[i+1] = r // insert into it's place

}

 

We start with R0,R1 sequence, here R0 is artificial. Then we insertrecords R2,R3,..Rn into the sequence. Thus, the file with n records willbe ordered making n-1 insertions.

Example: Let m be maxint

 

a)

j R0 R1 R2 R3 R4 R5

--------- -- -- -- -- -- ---

-m 5 4 3 2 1

V

2 -m 4 5 3 2 1

v

3 -m 3 4 5 2 1

v

4 -m 2 3 4 5 1

v

5 -m 1 2 3 4 5

 

b)

j R0 R1 R2 R3 R4 R5

Page 79: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 79/90

 

--------- -- -- -- -- -- ---

-m 12 7 5 10 2

v

2 -m 7 12 5 10 2

v3 -m 5 7 12 10 2

v

4 -m 5 7 10 12 2

v

5 -m 2 5 7 10 12

 

2. Quick Sort: (C.A.R. Hoare, 1962)

 

Given a list of keys. Get the first key and find its exact place in the list.Carry the elements less than the first element to a sublist to the leftand carry the elements greater than the first element to a sublist tothe right.

Example:

503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703

I I I I I I I I I I I I I I I 3

İ I I I I I I I I I j j j j j j 154 < 503

154 I I I I I I I I I 503 154 <-> 503

i i i I I I I I I I j 503 < 512

503 I I I I I I I 512 503 <-> 512

İ I I I I I I j j 426 < 503

426 I I I I I I 503 426 <-> 503

i i i I I I I j 503 < 908

503 I I I I 908 503 <-> 908

I I I j j j 275 < 503

275 I I 503 275 <-> 503I i I j 503 < 897

503 897 503 <-> 897

ij i=j stop

 

Now we have

Page 80: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 80/90

 

L1 L2

[154 087 426 061 275 170] 503 [897 653 908 512 509 612 677 765703]

Aply quick sort to lists L1 and L2, recursively,

[061 087] 154 [170 275 426] 503 [....

And we get the sorted list following in this manner.

 

4. Radix Sort

Let's have the following 4-bit binary numbers. Assume there is no sign

bit.

1010, 0101, 1011, 0011, 0110, 0111

(10) (5) (3) (11) (6) (7)

1. First begin with the LSB (least significant bit). Make two groups,onewith all numbers that end in a "0" and the other with all numbers thatend in a "1".

0 1

--- ---

1010 0101

0110 0011

1011

0111

2. Now, go to the next less SB and by examining the previous groups inorder, form two new groups:

0 1

--- ---

0101 1010

0110

Page 81: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 81/90

 

0011

1011

0111

 

3) Repeat the operation for the third bit from the right:

0 1

--- ---

1010 0101

0011 0110

1011 0111

4) Repeat it for the most significant bit:

 

0 1 Result=0011

--- --- 0101

0011 1010 0110

0101 1011 0111

0110 1010

0111 1011

 

5. Merge Sort

 

In merge sort, two already sorted files are 'merged' to obtain a thirdfile which is the sorted combination of the two sorted input files.

- We begin by assuming we have n sorted files with size=1

-Then,we merge these files of size=1 pairwise to obtain n/2 sorted filesof size=2

Page 82: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 82/90

 

-Then,we merge these n/2 files of size=2 pairwise to obtain n/4 sortedfiles of size=4,etc..

-Until we are left with one file with size=n.

 

Example :

13 8 61 53 10 46 22

\ / \ \ / \ /

[8 13] [2 61] [10 53] [22 46]

\ / \ /

[2 813

61] [1022

46 53]

\ /

[2 8 10 1322

4653

61]

 

 To merge two sorted files (x[1].. x[m]) and (y[m+1]..y[n]), to get athird file (z[1]..z [n]) with key1<key2<...<keyn , which will be thesorted combination of them, the following Pascal procedure can beused :

Procedure MERGE(m,n:integer; int x[], int y[]; int &z[]);

int i,j,k,p

{ /* Merge */

i=1; /* i is a pointer to x */

j= 1; /* j is a pointer to y */

k:=1; /* k points to the next available location in z */

while (i<=m)&& (j<=n)

{

if (x[i].key <= y[j].key)

{ /* take element from x */

z[k]=x[i];

Page 83: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 83/90

 

i++;

}

else

{ /* take element from y */

z[k]=y[j];

j++;

};

k:=k+1 /* added one more element into z */

}; /* while */

if ( i>m )

{

for (p=j;p<=n; p++)

z[k+p-j]= y[p] /* remainig part of y into z */

}

else

{

for (p=i; p<=m; p++)

z[k+p-i] = x[p] /* remaining part of x into z */

}

}

6. Bubble Sort

void Bubble_Sort(int &A[1..n]);

{

int i,j

Page 84: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 84/90

 

for (j= 1; j<=n-1; j--)

for (i=n-1; i>j-1; i--)

if (A[i+1]<A[i]) { swap(A,i,i+1)}

}

HEAP SORT

With its time complexity of O(n log(n)) heapsort is optimal. It utilizes a special data structure

called heap. This data structure is explained in the following.

 

BasicsDefinition: Let T = (V , E ) an almost complete binary tree with a vertex labelling a : V   M that

assigns to each vertex u a label a(u) from an ordered set (M , ).

A vertex u  V has the heap property if it has no direct descendant with a greater label, i.e.

v  V  : (u, v)  E   a(u) a(v)

T is a heap if all vertices have the heap property, i.e.

(u, v)  E  : a(u) a(v)

We call T a semi-heap if all vertices except possibly the root r have the heap property, i.e.

(u, v)  E , u r  : a(u) a(v)

Example: 

Page 85: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 85/90

 

Figure 1: Heap with n = 10 vertices

Observe that each leaf automatically has the heap property regardless of its label, since it has nodescendants.

Heapsort

The data structure of the heapsort algorithm is a heap. The data sequence to be sorted is storedas the labels of the binary tree. As shown later, in the implementation no pointer structures arenecessary to represent the tree, since an almost complete binary tree can be efficently stored inan array.

Heapsort algorithm

The following description of heapsort refers to Figure 2 (a) - (e).

(a) (b) (c)

(d) (e)

Figure 2: Retrieving the maximum element and restoring the heap

Page 86: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 86/90

 

If the sequence to be sorted is arranged as a heap, the greatest element of the sequence can beretrieved immediately from the root (a). In order to get the next-greatest element, the rest of theelements have to be rearranged as a heap.

The rearrangement is done in the following way:

Let b be a leaf of maximum depth. Write the label of b to the root and delete leaf b (b). Now thetree is a semi-heap, since the root possibly has lost its heap property.

Making a heap from a semi-heap is simple:

Do nothing if the root already has the heap property, otherwise exchange its label with the

maximum label of its direct descendants (c). Let this descendant be v. Now possibly v has lost its

heap property. Proceed further with v, i.e. make a heap from the semi-heap rooted at v (d). This

process stops when a vertex is reached that has the heap property (e). Eventually this is the caseat a leaf.

Making a heap from a semi-heap can conceptionally be implemented by the following procedure

downheap:

procedure downheap (v )

Input:semi-heap with root v 

Output: heap (by rearranging the vertex labels)

Method:

1. while v does not have the heap property do

1. choose direct descendant w with maximum label a(w)

exchange a(v) and a(w)

set v := w 

Procedure downheap can be used to build a heap from an arbitrarily labelled tree. By proceedingbottom-up, downheap is called for all subtrees rooted at inner vertices. Since leaves are alreadyheaps they may be omitted.

procedure buildheap

Input:almost complete binary tree T of depth d (T ) with vertex labelling a 

Output: heap (by rearranging the vertex labels)

Method:

1. for i := d (T ) – 1 downto 0 do

Page 87: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 87/90

 

1. for all inner vertices v of depth d (v) = i do

1. downheap (v )

A call of buildheap is the first step of procedure heapsort , which can now be written down asfollows:

procedure heapsort 

Input:almost complete binary tree with root r and vertex labelling a 

Output: vertex labels in descending order 

Method:1. buildheap 

while r is not a leaf do

1. output a(r )

choose leaf b of maximum depth

write label a(b) to r  

delete leaf b 

downheap (r )

output a(r )

Simulation

This simulation illustrates the process of building a heap from an arbitrarily labelled tree.

Analysis

An almost complete binary tree with n vertices has a depth of at most log(n). Therefore,

procedure downheap requires at most log(n) steps. Procedure buildheap calls downheap for 

Page 88: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 88/90

 

each vertex, therefore it requires at most n·log(n) steps. Heapsort calls buildheap once; then it

calls downheap for each vertex, together it requires at most 2·n·log(n) steps.

Thus, the time complexity of heapsort is T (n) O(n·log(n)). The algorithm is optimal, since the

lower bound of the sorting problem is attained.

Implementation

An almost complete binary tree with n vertices and vertex labelling a can be stored most

efficiently in an array a:

• the root is stored at position 0

• the two direct descendants of a vertex at position v are stored at positions 2v+1 and 2v+2

All vertices at positions 0, ..., n/2-1 are inner nodes, all vertices n/2, ..., n-1 are leaves (integer division / ). The heap with n = 10 vertices of Figure 1 is shown in Figure 3 as an example.

Figure 3:  Array representation of the heap of F igure 1

Using this array representation of the heap, heapsort can be implemented as an in-place sorting

algorithm. In each step of heapsort, the root label a(r ) is not output but stored at the position of 

the leaf b that is deleted in the following. Deleting leaf b means to consider just the array

elements left of b as the remaining heap.

In other words: the four steps

• output of the label of the root a(r )

• choose leaf b of maximum depth

• write label of b to the root

• delete leaf b 

are replaced by an exchange of the root label and the label of b:

• exchange the label of the root with the label of the last leaf and do not consider that leaf anymore

Page 89: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 89/90

 

Program

The following Java class HeapSorter encapsulates the functions downheap, buildheap and

heapsort . In order to sort an array b, Heapsort is called with the statement

HeapSorter.sort(b).

 public class HeapSorter{   private static int[] a;   private static int n;

   public static  void sort(int[] a0){

a=a0;n=a.length;heapsort();

}

   private static  void heapsort(){

buildheap();  while (n>1)

{n--;exchange (0, n);downheap (0);

}}

   private static  void buildheap(){

  for (int v=n/2-1; v>=0; v--)

downheap (v);}

   private static  void downheap(int v){

  int w=2*v+1; // first descendant of v  while (w<n)

{  if (w+1<n) // is there a second descendant?  if (a[w+1]>a[w]) w++;  // w is the descendant of v with maximum label

  if (a[v]>=a[w]) return; // v has heap property

  // otherwiseexchange(v, w); // exchange labels of v and wv=w; // continuew=2*v+1;

}}

   private static  void exchange(int i, int j){

  int t=a[i];

Page 90: E-Notes_data Srucrure DMIETR

5/11/2018 E-Notes_data Srucrure DMIETR - slidepdf.com

http://slidepdf.com/reader/full/e-notesdata-srucrure-dmietr 90/90

 

a[i]=a[j];a[j]=t;

}

} // end class HeapSorter