3 4 outline the sort problem

21
1 Sorting Operations - Comparisons and Data Movement Cmput 115 - Lecture 8 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from the book: Java Structures by Duane A. Bailey or the companion structure package Revised 27/07/03 2 About This Lecture In this lecture we will learn about the two basic operations performed by Sort Algorithms: comparison and element movement We will also learn how to express them as a number of access operations. 3 Outline Comparing Elements Moving Elements 4 The Sort Problem Given a collection, with elements that can be compared, put the elements in increasing or decreasing order. 60 30 10 20 40 90 70 80 50 0 1 2 3 4 5 6 7 8 10 20 30 40 50 60 70 80 90 0 1 2 3 4 5 6 7 8 5 Operations Given a collection, with elements that can be compared, put the elements in increasing or decreasing order. We must perform two operations to sort a collection: compare elements move elements The time to perform each of these two operations, and the number of times we perform each operation, is critical to the time it takes to sort a collection. 6 Comparing Primitive Values The sorting algorithms we will consider are based on comparing individual elements. If the elements are primitive values, we can use the < operator to compare them. If the elements are objects, we cannot use < However, Java has an interface called Comparable (in the java.lang package) that defines the method compareTo(Object object) which can be used to compare objects from classes implementing the Comparable interface.

Upload: others

Post on 28-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3 4 Outline The Sort Problem

1

Sorting Operations - Comparisons and Data Movement

Cmput 115 - Lecture 8

Department of Computing Science

University of Alberta

©Duane Szafron 2000Some code in this lecture is based on code from the book:

Java Structures by Duane A. Bailey or the companion structure package

Revised 27/07/03 2

About This Lecture

� In this lecture we will learn about the two basic operations performed by Sort Algorithms: comparison and element movement

� We will also learn how to express them as a number of access operations.

3

Outline

� Comparing Elements� Moving Elements

4

The Sort Problem

� Given a collection, with elements that can be compared, put the elements in increasing or decreasing order.

60 30 10 20 40 90 70 80 500 1 2 3 4 5 6 7 8

10 20 30 40 50 60 70 80 900 1 2 3 4 5 6 7 8

5

Operations� Given a collection, with elements that can be

compared, put the elements in increasing or decreasing order.

� We must perform two operations to sort a collection:– compare elements– move elements

� The time to perform each of these two operations, and the number of times we perform each operation, is critical to the time it takes to sort a collection.

6

Comparing Primitive Values� The sorting algorithms we will consider are based on

comparing individual elements. � If the elements are primitive values, we can use the <

operator to compare them.� If the elements are objects, we cannot use < � However, Java has an interface called Comparable (in

the java.lang package) that defines the methodcompareTo(Object object) which can be used to compare objects from classes implementing the Comparable interface.

Page 2: 3 4 Outline The Sort Problem

2

7

Comparing Objects - CompareTo()

� compareTo(Object object)– If the receiver is “ less” than the argument, it

returns a negative int.– If the receiver is “ equal” to the argument, it returns

the zero int.– If the receiver is “ greater” than the argument, it

returns a positive int.

8

Designing Classes for Sorting� If we write our sorting algorithms using the

compareTo() method, we can apply them to collections that hold objects from any classes that implement the Comparable interface.

� For example, the String and Integer classes implement Comparable.

� Any class designer who wants the objects in a class to be sortable using our algorithms only needs to make the class implement the Comparable interface.

9

The Time for Comparing Values� The time to actually compare two primitive values is

small (one virtual machine instruction).� The comparison time for values is dominated by the

time it takes to access the two elements being compared.

� For example, the two array accesses take much more time than the actual comparison in the code:if (data[i] < data[j])

� Therefore, a comparison of primitive values "costs" two data accesses.

10

The Time for Comparing Objects 1� It takes longer to compare two Java objects than two

Java primitive values.� To compare objects, the compareTo() method must

access not only the object, but its internal state as well.

� For example, the next slide shows the Java source code from the library (java.String) to compare two Strings.– The details are not important, just notice that it requi res many

accesses to the array that each String uses to store its chars.� The important point is that comparing objects can

"cost" many data accesses.

11

The Time for Comparing Objects 2

publ i c i nt compar eTo( St r i ng anot her St r i ng) {

i nt l en1 = t hi s. count ;

i nt l en2 = anot her St r i ng. count ;

i nt n = Mat h. mi n( l en1, l en2) ;

char v1[ ] = t hi s. val ue;

char v2[ ] = anot her St r i ng. val ue;

i nt i = t hi s. of f set ;

i nt j = anot her St r i ng. of f set ;whi l e ( n- - ! = 0) {

char c1 = v1[ i ++] ;char c2 = v2[ j ++] ;i f ( c1 ! = c2) {r et ur n c1 - c2;

}}r et ur n l en1 - l en2;

}

12

Moving Elements� Besides comparing elements, the only other operation

that is essential to sorting is moving elements.� The exact code for moving elements depends on the

type of collection and the pattern of element movement, but it consists of a series of data accesses.

� One common form of element movement is an exchange which is done using a single temporary variable and three assignments.

� This process usually involves four container accesses and two local variable accesses.

� Since the local variable accesses often get mapped to registers or cache memory we won't count them.

Page 3: 3 4 Outline The Sort Problem

3

13

Exchange Algorithm (1)

t emp = c [ i ] ;ref i ref j

data a data b

temp

ref i ref j

data a data b

temp

ref j

data a data b

temp

ref ic[ i ] = c[ j ] ;

��� ��������������������� �� ��� ��������������

������ �� ��� ����������������

14

Exchange Algorithm (2)

c [ j ] = t emp;ref j

data a data b

temp

ref i

data a data b

temp

ref i ref j

��� ��������������������� �� ��� �������������

15

Comparison and Movement Times� To predict the total time for an algorithm, we can

add the accesses used for comparison and the accesses used for movement in the algorithm.

� If the container holds primitive values each comparison requires two accesses, but if the container holds objects the number of accesses may be harder to compute.

� If the algorithm uses exchanges, each exchange requires four accesses, but if the algorithm uses a different style of data movement, the number of accesses may be harder to compute.

16

Sorting

� We will look at these sorting algorithms– Selection sort– Insertion sort– Merge sort– Quick sort

Sorting - Selection Sort

Cmput 115 - Lecture 9

Department of Computing Science

University of Alberta

©Duane Szafron 2000Some code in this lecture is based on code from the book:

Java Structures by Duane A. Bailey or the companion structure package

Revised 27/07/03 18

About This Lecture

� In this lecture we will review the sorting algorithm called Selection Sort which was presented in CMPUT 114.

� We will analyze the time and space complexity of a standard implementation for sorting arrays.

Page 4: 3 4 Outline The Sort Problem

4

19

Outline

� Selection Sort Algorithm� Selection Sort – Implementation for Arrays� Time and Space Complexity of Selection

Sort

20

Selection Sort AlgorithmInput:

anArray – array of Comparable objectsn – sort the elements in positions 0…(n-1)

Output: a sorted array [0..(n-1)]

Idea: We first pick up the largest item and store it at the end, andthen pick up the 2nd largest item and put it at the 2nd last, …

AlgorithmFor last =(n-1),(n-2),…,1– Find the largest element in positions 0…last – exchange it with the element at the last index

21

Selection Sort Code - Arrays

publ i c s t at i c voi d sel ec t i onSor t ( Compar abl e anAr r ay [ ] , i nt n) {

/ / pr e: 0 <= n <= anAr r ay. l engt h/ / pos t : val ues i n anAr r ay pos i t i ons 0…. . ( n- 1)/ / ar e i n ascendi ng or der

i nt max I ndex ; / / i ndex of l ar ges t obj ecti nt l ast ; / / i ndex of l as t unsor t ed el ement

f or ( l as t = n- 1; l as t > 0; l as t - - ) {max I ndex = get Max I ndex( anAr r ay , l as t ) ;swap( anAr r ay , l as t , max I ndex) ;

} / / we coul d check t o see i f max I ndex ! = l as t} / / and onl y swap i n t hi s case.

code based on Bailey pg. 82

22

Method - getMaxIndex() - Arrays

publ i c s t at i c i nt get Max I ndex( Compar abl e anAr r ay [ ] , i nt l ast ) {

/ / pr e: 0 <= l ast < anAr r ay . l engt h/ / pos t : r et ur n t he i ndex of t he max val ue i n/ / pos i t i ons 0…. . ( l as t - 1) of t he gi ven ar r ay

i nt max I ndex ; / / i ndex of l ar ges t obj ecti nt i ndex ; / / i ndex of i nspect ed el ement

max I ndex = l as t ;f or ( i ndex = l ast - 1; i ndex >= 0; i ndex- - ) {

i f ( anAr r ay[ i ndex] . compar eTo( anAr r ay[ maxI ndex] ) > 0)maxI ndex = i ndex;

}r et ur n maxI ndex

}

code based on Bailey pg. 82

23

Counting Comparison Accesses� How many comparison accesses are requi red for a selection

sort of n elements ?� All the comparisons are done in getMaxIndex.� getMaxIndex’s loop does one comparison per iteration, and

iterates “ last” times, so each call to getMaxindex does “ last” comparisons.

� The loop body in the sort method is executed (n-1) times with “ last” taking on the values n-1, n-2, … 1. getMaxIndex is called once with each value of “ last” .

� The total number of comparisons is:(n-1)+(n-2) + … + 1= (1+2+ … n)-n = [n(n+1)/2] - n

� Since each comparison requires two accesses there are:n(n+1) - 2n = n2 - n = O(n2) comparison accesses.

24

Selection Sort Code (recursion)publ i c s t at i c voi d sel ec t i onSor t ( Compar ab l anAr r ay [ ] ,

i nt n) {

/ / pr e: 0 <= n <= anAr r ay . l engt h/ / pos t : obj ect s i n anAr r ay pos i t i ons 0…. . ( n- 1) / / ar e i n ascendi ng or der

i nt max I ndex ; / / i ndex of l ar ges t obj ecti f ( n > 1 ) {

maxI ndex = get Max I ndex( anAr r ay, n) ; swap( anAr r ay, n, max I ndex) ;sel ect i onSor t ( anAr r ay , n- 1) ;

}}

Page 5: 3 4 Outline The Sort Problem

5

25

Counting Move Accesses� How many move accesses are required for a

selection sort of n elements?� The only time we do a move is in a reference

exchange (swap) which required 4 accesses.� The sort method executes swap() once on each

iteration of its loop. The loop iterates n-1 times.� The total number of move accesses is:

4*(n-1) = O(n).� Since the number of comparison accesses is

O(n2), the move accesses are insignificant.� In total the code does O(n2) accesses.

26

Time Complexity of Selection Sort

� The number of comparisons and moves is independent of the data (the initial order of elements doesn’t matter).

� Therefore, the best, average and worst case time complexities of the Selection Sort are all O(n2).

27

Space Complexity of Selection Sort

� Besides the collection itself, the only extra storage for this sort is the single temp reference used in the swap method.

� Therefore, the space complexity of Selection Sort is O(n).

Sorting - Insertion Sort

Cmput 115 - Lecture 10

Department of Computing Science

University of Alberta

©Duane Szafron 2000Some code in this lecture is based on code from the book:

Java Structures by Duane A. Bailey or the companion structure package

Revised 27/07/03

29

About This Lecture

� In this lecture we will learn about a sorting algorithm called the Insertion Sort.

� We will study its implementation and its time and space complexity.

30

Outline

� The Insertion Sort Algorithm� Insertion Sort - Arrays� Time and Space Complexity of Insertion

Sort

Page 6: 3 4 Outline The Sort Problem

6

31

Insertion Sort AlgorithmInput:

anArray – array of Comparable objects

n – sort the elements in positions 0…(n-1)

Output: a sorted array [0..(n-1)]

Idea: (1) We try to sort the first 1 object, then first 2 objects,and then first 3 object, …, then first n-1 objects

(2) Each step, we insert the k+1 object into an appropriate position between 0 and k+1

AlgorithmFOR (k = 1; k <= n-1; k ++ ) DO

- find the appropriate position for anArray[k] between 0..k

- insert anAraary[k] into anArray[0..k]

32

Insertion Sort Algorithm 1

� The lower part of the collection is sorted and the higher part is unsorted.

60 30 10 20 40 90 70 80 500 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 830 60 10 20 40 90 70 80 50

� Insert the first element of the unsorted part into the correct place in the sorted part.

33

30 60 10 20 40 90 70 80 500 1 2 3 4 5 6 7 8

Insertion Sort Algorithm 2

0 1 2 3 4 5 6 7 810 30 60 20 40 90 70 80 500 1 2 3 4 5 6 7 810 20 30 60 40 90 70 80 500 1 2 3 4 5 6 7 810 20 30 40 60 90 70 80 500 1 2 3 4 5 6 7 810 20 30 40 60 90 70 80 500 1 2 3 4 5 6 7 810 20 30 40 60 70 90 80 500 1 2 3 4 5 6 7 810 20 30 40 60 70 80 90 500 1 2 3 4 5 6 7 810 20 30 40 50 60 70 80 90

34

Insertion Sort Code - Arrays

publ i c s t at i c voi d i nser t i onSor t ( Compar abl e anAr r ay [ ] , i nt s i ze) {

/ / pr e: 0 <= s i ze <= anAr r ay. l engt h/ / pos t : val ues i n anAr r ay ar e i n ascendi ng or der

i nt i ndex ; / / i ndex of st ar t of unsor t ed par t

f or ( i ndex = 1; i ndex < si ze; i ndex++) {moveEl ement At ( anAr r ay , i ndex) ;

}}

code based on Bailey pg. 83

35

Moving Elements in Insertion Sort� The Insertion Sort does not use an exchange

operation.� When an element is inserted into the ordered part

of the collection, it is not just exchanged with another element.

� Several elements must be “ moved” .

10 30 60 20

0 1 2 3

10 20 30 60

0 1 2 3

36

Multiple Element Exchanges� The naïve approach is to just keep exchanging the

new element with its left neighbor until it is in the right location.

10 30 60 20

0 1 2 3

10 30 20 60

0 1 2 3

10 20 30 60

0 1 2 3

� Every exchange costs four access operations.� If we move the new element two spaces to the left,

this costs 2*4 = 8 access operations.

Page 7: 3 4 Outline The Sort Problem

7

37

Method - moveElementAt() - Arrays

publ i c s t at i c voi d moveEl ement At ( Compar abl e anAr r ay [ ] , i nt l ast ) {

/ / pr e: 0 <= l ast < anAr r ay . l engt h and anAr r ay i n/ / ascendi ng or der f r om 0 t o l as t - 1/ / pos t : anAr r ay i n ascendi ng or der f r om 0 t o l ast

whi l e ( ( l ast >0) &&( anAr r ay[ l ast ] . compar eTo( anAr r ay [ l ast - 1] ) < 0) ) {swap( anAr r ay , l as t , l as t - 1) ;l as t - - ;

}}

code based on Bailey pg. 83

38

Avoiding Multiple Exchanges� We can insert the new element in the correct place

with fewer accessing operations - only 6 accesses!

10 30 60 20

0 1 2 3

move

1

� In general if an element is moved (p) places it only takes (2*p + 2) access operations, not (4*p) access operations as required by (p) exchanges.

1. move = anAr r ay[ 3] ;2. anAr r ay[ 3] = anAr r ay[ 2] ;3. anAr r ay[ 2] = anAr r ay[ 1] ;4. anAr r ay[ 1] = move; 10 30 60 20

0 1 2 32

move10 30 60 20

0 1 2 33

10 30 60 20

0 1 2 34

move

39

Recall Element Insertion in a Vector

� This operation is similar to inserting a new element in a Vector.

� Each existing element was “ moved” to the right before inserting the new element in its correct location.

40

Recall Vector Insertion Code

publ i c voi d i nser t El ement At ( Obj ect obj ect , i nt i ndex) {/ / pr e: 0 <= i ndex <= si ze( )/ / post : i nser t s t he gi ven obj ect at t he gi ven i ndex,/ / movi ng el ement s f r om i ndex t o si ze( ) - 1 t o t he r i ght

i nt i ;

t hi s. ensur eCapaci t y( t hi s. el ement Count + 1) ;f or ( i = t hi s. el ement Count ; i > i ndex; i - - )

t hi s. el ement Dat a[ i ] = t hi s. el ement Dat a[ i - 1] ;

t hi s. el ement Dat a[ i ndex] = obj ect ;

t hi s. el ement Count ++;

}

code based on Bailey pg. 39

41

Differences from Element Insertion� In Vector element insertion:

– We have a reference to the new element.– We know the index location for the new element.

� In the Insertion sort:– We don’t have a reference to the new element,

only an index in the array where the new element is currently located.

– We don’t know the index location for the new element. We need to find the index by comparing the new element with the elements in the collection from right to left.

42

Method - moveElementAt() - Arrays

publ i c s t at i c voi d moveEl ement At ( Compar abl e anAr r ay [ ] , i nt l ast ) {

/ / pr e: 0 <= l ast < anAr r ay . l engt h and anAr r ay i n/ / ascendi ng or der f r om 0 t o l as t - 1/ / pos t : anAr r ay i n ascendi ng or der f r om 0 t o l ast

Compar abl e move; / / A r ef er ence t o t he el ement bei ng moved

move = anAr r ay [ l as t ] ;whi l e ( ( l ast >0) && ( move. compar eTo( anAr r ay[ l as t - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}anAr r ay[ l as t ] = move;

}

code based on Bailey pg. 83

Page 8: 3 4 Outline The Sort Problem

8

43

Counting Comparisons� How many comparison operations are required for an

insertion sort of an n-element col lection?� The sort method calls moveElementAt() in a loop for the

indexes: i = 1, 2, … n - 1.f or ( i ndex = 1; i ndex < si ze; i ndex++) {

t hi s. moveEl ement At ( anAr r ay , i ndex) ;� Each time moveElementAt() is executed for some

argument, last, it does a comparison in a loop for some of the indexes: last, last-1, … 1.whi l e ( ( l ast >0) && ( anAr r ay[ l ast ] . compar eTo( anAr r ay[ l ast - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}

44

Comparisons - Best Case� In the best case there is 1 comparison per call since the

first comparison terminates the loop.whi l e ( ( l ast >0) && ( anAr r ay[ l ast ] . compar eTo( anAr r ay[ l ast - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}

move(a, 1) move(a, 2) move(a, n-1). . .

1 1 1

� The total number of comparisons is:(n - 1) * 1 = n - 1 = O(n)

45

Comparisons - Worst Case� In the worst case there are " last" comparisons per cal l

since the loop is not terminated until last == 0.whi l e ( ( l ast >0) && ( anAr r ay[ l ast ] . compar eTo( anAr r ay[ l ast - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}

move(a, 1) move(a, 2) move(a, n-1). . .

1 2 n-1

� The total number of comparisons is:1 + 2 + … (n - 1) = [(n-1)*n] / 2 = O(n2)

46

Comparisons - Average Case 1� In the average case it is equally probable that the

number of comparisons is any number between 1 and " last" (inclusive) for each call.whi l e ( ( l ast >0) && ( anAr r ay[ l ast ] . compar eTo( anAr r ay[ l ast - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}

move(a, 1) move(a, 2) move(a, n-1). . .

1 1 1

� Note that the average for each cal l is:(1 + 2 + … last)/last = [last*(last+1)]/[2*last] = (last+1)/2

2 2 n-1. . .

1 last 1 last

1 last

47

Comparisons - Average Case 2� In the average case, the total number of

comparisons is:(1+1)/2 + (2+1)/2 + … + ((n-1) + 1)/2 =1/2 + 1/2 + 2/2 + 1/2 + … + (n-1)/2 + 1/2 =[1 + 2 + … + (n-1)]*(1/2) + (n-1)*(1/2) =[ (n-1)*n/2]*(1/2) + (n-1)*(1/2) =[ (n-1)*n]*(1/4) + 2*(n-1)*(1/4) =[ (n-1)*n + 2*(n-1)]*(1/4) =[ (n-1)*(n + 2)]*(1/4) = O(n2)

48

Counting Moves� How many move operations are required for an insertion

sort of an n-element collection?� The sort method calls moveElementAt() in a loop for the

indexes: k = 1, 2, … n - 1.� Every time the method is called, the element is moved

one place for each successful comparison.whi l e ( ( l ast >0) && ( anAr r ay[ l ast ] . compar eTo( anAr r ay[ l ast - 1] ) < 0) ) {

anAr r ay[ l as t ] = anAr r ay [ l as t - 1] ;l as t - - ;

}� There is one move operation for each comparison so the

best, worst and average number of moves is the same as the best, worst and average number of comparisons.

Page 9: 3 4 Outline The Sort Problem

9

49

Counting Accesses� Each comparison requires 2 accesses.� Each move requires 2 accesses.� Each time that moveElementAt() is called, there are two

other accesses, one before the while loop and one after.� Since moveElementAt() is called n-1 times, there are

2*(n-1) = O(n) of these extra accesses.� Therefore, the "order" of the best, worst and average

number of accesses is the same as the "order" of the best, worst and average number of comparisons.

50

Time Complexity of Insertion Sort

� Best case O(n) accesses.� Worst case O(n2) accesses.� Average case O(n2) accesses.� Note: this means that for nearly sorted

collections, insertion sort is better than selection sort even though in average and worst cases, they are the same: O(n2).

51

Space Complexity of Insertion Sort

� Besides the collection itself, the only extra storage for this sort is the single temp reference used in the move element method.

� Therefore, the space complexity of Insertion Sort is O(n).

Sorting - Merge Sort

Cmput 115 - Lecture 11

Department of Computing Science

University of Alberta

©Duane Szafron 2000Some code in this lecture is based on code from the book:

Java Structures by Duane A. Bailey or the companion structure package

Revised 27/07/03

53

About This Lecture

� In this lecture we will learn about a sorting algorithm called the Merge Sort.

� We will study its implementation and its time and space complexity.

54

Outline

� Merge: combining two sorted arrays� Merge algorithm� Time and Space complexity for Merge

� The Merge Sort Algorithm� Merge Sort - Arrays� Time and Space Complexity of Merge Sort

Page 10: 3 4 Outline The Sort Problem

10

55

Merging Two Sorted Arrays

� Merge is an operation that combines two sorted arrays together into one.

10 40 600 1 2

50 70 80 900 1 2 3

10 40 50 60 70 80 900 1 2 3 4 5 6

merge

56

Merge Sort AlgorithmInput:

anArray – array of Comparable objectsn – sort the elements in positions 0…(n-1)

Output: a sorted array [0..(n-1)]

Idea: (1) split the given array into two arrays,(2) sort the first and next half array separately,(2) merge the two sorted array into one sorted array

Algorithm(1) sort anArray[0..n/2-1],(2) sort anArray[n/2 .. n-1],(3) merge two arrays into one sorted array

57

Merge Algorithm – initial version � For now, assume the result is to be placed in a

separate array called r esul t , which has already been allocated.

� The two given arrays are called f r ont and back (the reason for these names will be clear later).

� f r ont and back are in increasing order.

� For the complexity analysis, the size of the input, n, is the sum n front + nback

58

Merge Algorithm� For each array keep track of the current position

(initially 0).

� REPEAT until all the elements of one of the given arrays have been copied into r esul t :– Compare the current elements of f r ont and back

– Copy the smaller into the current position of r esul t (break ties however you like)

– Increment the current position of r esul t and the array that was copied from

� Copy all the remaining elements of the other given array into r esul t .

59

Merge Example (1)

Current positions indicated in red

10 40 600 1 2

50 70 80 900 1 2 3 0 1 2 3 4 5 6

Compare current elements; copy smaller; update current

10 40 600 1 2

50 70 80 900 1 2 3

100 1 2 3 4 5 6

Compare current elements; copy smaller; update current

60

Merge Example (2)

10 40 600 1 2

50 70 80 900 1 2 3

10 400 1 2 3 4 5 6

Compare current elements; copy smaller; update current

10 40 600 1 2

50 70 80 900 1 2 3

10 40 500 1 2 3 4 5 6

Compare current elements; copy smaller; update current

Page 11: 3 4 Outline The Sort Problem

11

61

Merge Example (3)

10 40 600 1 2

50 70 80 900 1 2 3

10 40 50 600 1 2 3 4 5 6

Copy the rest of the elements from the other array

10 40 600 1 2

50 70 80 900 1 2 3

10 40 50 60 70 80 900 1 2 3 4 5 6

62

Merge Code – version 1 (1)

pr i vat e st at i c voi d mer ge( i nt [ ] f r ont , i nt [ ] back,i nt [ ] r esul t , i nt f i r st , i nt l ast ) {

/ / pr e: al l posi t i ons i n f r ont and back ar e sor t ed,/ / r esul t i s al l ocat ed,/ / ( l ast - f i r s t +1) == ( f r ont . l engt h + back . l engt h)/ / post : posi t i ons f i r st t o l ast i n r esul t cont ai n one copy/ / of each el ement i n f r ont and back i n sor t ed or der .

i nt f =0 ; / / f r ont i ndexi nt b=0 ; / / back i ndexi nt i =f i r st ; / / i ndex i n r esul t whi l e ( ( f < f r ont . l engt h) && ( b < back. l engt h) ) {

i f ( f r ont [ f ] < back[ b] ) { r esul t [ i ] = f r ont [ f ] ;i ++ ; f ++ ;

} el se {r esul t [ i ] = back[ b] ;i ++ ; b++ ;

}}

63

Merge Code – version 1 (2)

/ / copy r emai ni ng el ement s i nt o r esul t

whi l e ( f < f r ont . l engt h) {r esul t [ i ] = f r ont [ f ] i ++ ;f ++ ;

}whi l e ( b < back. l engt h) {

r esul t [ i ] = back[ b] ;i ++ ;b++ ;

}}

64

Merge – complexity� Every element in f r ont and back is copied

exactly once. Each copy is two accesses, so the total number of accesses due to copying is 2n.

� The number of comparisons could be as small as min(n front,nback) or as large as (n-1). Each comparison is two accesses.

� In the worst case the total number of accesses is 2n+2(n-1) = O(n).

� In the best case the total number of accesses is 2n+ 2*min(n front,nback) = O(n)

� The average case is between the worst and best case and is therefore also O(n).

� Memory required: 2n = O(n)

65

Merge Sort Algorithm� Merge Sort sorts a given array (anAr r ay) into

increasing order as follows:� Split anAr r ay into two non-empty parts any way

you like. For examplef r ont = the first n/2 elements in anAr r ay

back = the remaining elements in anAr r ay� Sort f r ont and back by recursively calling

MergeSort with each one.� Now you have two sorted arrays containing all

the elements f rom the original array. Use mer ge to combine them, put the result in anAr r ay .

66

Merge Sort – (1) Split

40 60 10 90 50 80 700 1 2 3 4 5 6

40 60 100 1 2

90 70 80 900 1 2 3

Page 12: 3 4 Outline The Sort Problem

12

67

Merge Sort – (2) recursively sort front

10 40 600 1 2

40 60 100 1 2

50 70 80 900 1 2 3

40 60 10 90 50 80 700 1 2 3 4 5 6

mergesort(front)

68

Merge Sort – (3) recursively sort back

10 40 600 1 2

50 70 80 900 1 2 3

40 60 100 1 2

90 70 80 900 1 2 3

40 60 10 90 50 80 700 1 2 3 4 5 6

mergesort(back)

69

Merge Sort – (4) merge

10 40 600 1 2

50 70 80 900 1 2 3

10 40 50 60 70 80 900 1 2 3 4 5 6

merge

40 60 100 1 2

90 70 80 900 1 2 3

40 60 10 90 50 80 700 1 2 3 4 5 6

Final resultOriginal array

70

Merge Sort Algorithm - summary

10 40 600 1 2

50 70 80 900 1 2 3

10 40 50 60 70 80 900 1 2 3 4 5 6

merge

40 60 100 1 2

90 70 80 900 1 2 3

40 60 10 90 50 80 700 1 2 3 4 5 6

Final resultOriginal array

Recursively sort each part

71

MergeSort Code – version 1

publ i c st at i c voi d mer gesor t ( i nt [ ] anAr r ay, i nt f i r st ,i nt l ast ) {

/ / pr e: l ast < anAr r ay. l engt h/ / post : anAr r ay posi t i ons f i r st t o l ast ar e i n i ncr eas i ng or deri nt si ze = ( l ast - f i r st ) +1 ;i f ( si ze > 1) {

i nt f r ont si ze = si ze/ 2 ;i nt backsi ze = s i ze- f r ont si ze ;i nt [ ] f r ont = new i nt [ f r ont si ze] ;i nt [ ] back = new i nt [ backsi ze] ;i nt i ;f or ( i =0; i < f r ont si ze; i ++) { f r ont [ i ] = anAr r ay[ f i r st +i ] ; }f or ( i =0; i < backsi ze; i ++) { back[ i ] =

anAr r ay[ f i r st +f r ont si ze+i ] ; }mer gesor t ( f r ont , 0, f r ont si ze- 1) ;mer gesor t ( back, 0, backs i ze - 1) ;mer ge( f r ont , back , anAr r ay, f i r st , l ast ) ;}

}

72

MergeSort Call Graph (n=7)

0-2

0-0 1-2

1-1 2-2

3-6

3-4

3-3

5-6

4-4 5-5 6-6

0-6

lastfirst

How many levels are there, in general,if the array is divided in half each time ?

Each box representsone invocation of themergesort method.

Page 13: 3 4 Outline The Sort Problem

13

73

MergeSort Call Graph (general)

n/2

n/4 n/4

n/2

n/4 n/4

1 1

n# of positions to sort

1 1 1 11 1

Suppose n=2k .How many levels ?

What value is ineach box at level j ?

How manyboxes onlevel j ?

74

MergeSort – complexity analysis (1) � Each invocation of mergesort on p array

positions does the following:� Copies all p positions once (# accesses = O(p))� Calls merge (#accesses = O(p))

� Observe that p is the same for all invocations at the same level, therefore total # of accesses at a given level j is O((#invocations at level j )*p j )

75

MergeSort – complexity analysis (2) � The total # of accesses at level j is

O((#invocations at level j )*p j )= O( 2j * ( n/ 2j ) )= O( n )

� In other words, the total # of accesses at each level is the same, O(n)

� The total # of accesses for the entire mergesort is the sum of the accesses for all the levels. Since the accesses at every level is the same – O(n) – this is

(# levels)*O(n)= O(log(n))*O(n)= O(n*log(n))

76

Time Complexity of Merge Sort

� Best case - O(n log(n))� Worst case - O(n log(n))� Average case O(n log(n))� Note that the insertion sort is actually a

better sort than the merge sort if the original collection is almost sorted.

77

Space Complexity of Merge Sort (1)� In any recursive method, space is required for the

stack frames created by the recursive calls.

� The maximum amount of memory required for this purpose is(size of the stack frame) * (depth of recursion)

� The size of the stack frame is a constant, and for mergesort the depth of recursion (the number of levels) is O(log(n)).

� The memory required for the stack frames is therefore O(log(n)).

78

Space Complexity of Merge Sort (2)� Besides the given array, there are two temporary

arrays allocated in each invocation whose total size is the same as the number of positions to be sorted: at level j this is p j = n/2j

� This space is allocated before the recursive calls are made and needed after the recursive calls have returned and therefore the maximum total amount of space allocated is the sum of n/2j for j=0…log(n).

� This sum is O(n) – it is a little less than 2*n.� Therefore, the space complexity of Merge Sort is

O(n), but doubling the collection storage may sometimes be a problem.

Page 14: 3 4 Outline The Sort Problem

14

79

Making mergesort faster

� Although we cannot improve the big-O complexity of mergesort we can make it faster in practice by doing two things:– Reducing the amount of copying– Allocating temporary storage once at the very

outset

� We will make these improvements in 2 steps.

80

Reducing copying - back

� The back array is easy to eliminate. We just use the back portion of anAr r ay in its place.

� The only significant change in the code is to the merge method, which now must be told where the “ back” of anAr r ay begins.

� We can also eliminate from merge the final loop which copies values from back into the final positions of anAr r ay since these will be in the correct place in anAr r ay.

81

MergeSort Code – version 2 (1)

publ i c st at i c voi d mer gesor t ( i nt [ ] anAr r ay, i nt f i r st ,i nt l ast ) {

/ / pr e: l ast < anAr r ay. l engt h/ / post : anAr r ay posi t i ons f i r st t o l ast ar e i n i ncr eas i ng or deri nt si ze = ( l ast - f i r st ) +1 ;i f ( si ze > 1) {

i nt f r ont si ze = si ze/ 2 ;i nt backsi ze = s i ze- f r ont si ze ;i nt [ ] f r ont = new i nt [ f r ont si ze] ;i nt [ ] back = new i nt [ backsi ze] ;i nt i ;f or ( i =0; i < f r ont si ze; i ++) { f r ont [ i ] = anAr r ay[ f i r st +i ] ; }f or ( i =0; i < backsi ze; i ++) { back[ i ] =

anAr r ay[ f i r st +f r ont si ze+i ] ; }mer gesor t ( f r ont , 0, f r ont si ze- 1) ;

------------------------------------------

-------------------------------------------------------------------------------------------------

82

MergeSort Code – version 2 (2)

mer gesor t ( back , 0, backs i ze - 1) ;

i nt backs t ar t = f i r s t + f r ont s i ze;mer gesor t ( anAr r ay, backst ar t , l ast ) ;

mer ge( f r ont , back, anAr r ay , f i r s t , l ast ) ;

mer ge( f r ont , anAr r ay , f i r st , backs t ar t , l as t ) ;

}}

-------------------------------------------------------

------------------------------------------

83

Merge Code – version 2 (1)

pr i vat e st at i c voi d mer ge( i nt [ ] f r ont ,i nt [ ] anAr r ay, i nt f i r s t , i nt backst ar t ,i nt l ast ) {

i nt f =0 ; / / f r ont i ndexi nt b=backst ar t ; / / back i ndexi nt i =f i r st ; / / i ndex i n r esul t whi l e ( ( f < f r ont . l engt h) && ( b <= l ast ) ) {

i f ( f r ont [ f ] < anAr r ay [ b] ) { anAr r ay[ i ] = f r ont [ f ] ;i ++ ; f ++ ;

} el se {anAr r ay[ i ] = anAr r ay[ b] ; / / i <= b ALWAYS AT THI S POI NTi ++ ; b++ ;

}}

84

Merge Code – version 2 (2)

/ / copy r emai ni ng el ement s i nt o r esul t ( anAr r ay )

whi l e ( f < f r ont . l engt h) {anAr r ay[ i ] = f r ont [ f ] i ++ ;f ++ ;

}whi l e ( b < back. l engt h) {

anAr r ay[ i ] = back [ b] ; / / i ==b ALWAYS AT THI S POI NTi ++ ;b++ ;

}}

X

Page 15: 3 4 Outline The Sort Problem

15

85

Improving efficiency – f r ont (1)�

f r ont is as easy to eliminate as back in the mergesort method. We just use the front portion of anAr r ay in its place.

� But the merge method must make a copy of the front portion of anAr r ay before merging begins.

� This does not reduce copying at all, but it moves the temporary storage into the merge method, which means it is allocated AFTER the recursive calls and therefore less memory is needed in total.

86

Improving efficiency – f r ont (2)

� In addition, instead of allocating the storage each time merge is called, we can allocate it once, before the first call to mergesort is made, and pass this extra array on all calls.

� This saves the time it takes to allocate memory and garbage collect it, which in the previous versions was done once for every invocation.

Sorting - Quick Sort

Cmput 115 - Lecture 12

Department of Computing Science

University of Alberta

©Duane Szafron 2000Some code in this lecture is based on code from the book:

Java Structures by Duane A. Bailey or the companion structure package

Revised 27/07/03 88

About This Lecture

� In this lecture we will learn about a sorting algorithm called the Quick Sort.

� We will study its implementation and its time and space complexity.

89

Outline

� The Quick Sort Algorithm� Time and Space Complexity of Quick Sort

90

Quicksort Algorithm – initial version

� As we did with Mergesort, we will first give a simple version of Quicksort and then make efficiency improvements.

� Quicksort can be seen as a variation of Mergesort in which f r ont and back are defined in a different way.

Page 16: 3 4 Outline The Sort Problem

16

91

Merge Sort Algorithm - reminder� Merge Sort sorts a given array (anAr r ay) into

increasing order as follows:� Split anAr r ay into two non-empty parts any way

you like. For examplef r ont = the first n/2 elements in anAr r ay

back = the remaining elements in anAr r ay� Sort f r ont and back by recursively calling

MergeSort with each one.� Now you have two sorted arrays containing all

the elements f rom the original array. Use mer ge to combine them, put the result in anAr r ay .

92

Quicksort Algorithm� Partition anAr r ay into two non-empty parts.

Pick any value in the array, pi vot .

smal l = the elements in anAr r ay < pi vot

l ar ge = the elements in anAr r ay > pi vot

Place pi vot in either part, so as to make sure neither part is empty.

� Sort smal l and l ar ge by recursively calling Quicksort with each one.

� You could use mer ge to combine them, but because you know the elements in smal l are smaller than the elements in l ar ge you can simply concatenate smal l and l ar ge, and put the result into anAr r ay.

93

Quicksort – (1) Partition

50 60 40 90 10 80 700 1 2 3 4 5 6

Pivot

40 10 500 1 2

60 90 80 700 1 2 3

94

Quicksort – (2) recursively sort small

10 40 500 1 2

50 60 40 90 10 80 700 1 2 3 4 5 6

quicksort(small)

40 10 500 1 2

60 90 80 700 1 2 3

95

Quicksort – (3) recursively sort large

quicksort(large) 10 40 500 1 2

60 70 80 900 1 2 3

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 500 1 2

60 90 80 700 1 2 3

96

Quicksort – (4) concatenate

10 40 50 60 70 80 900 1 2 3 4 5 6

concatenate

Final resultOriginal array

10 40 500 1 2

60 70 80 900 1 2 3

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 500 1 2

60 90 80 700 1 2 3

Page 17: 3 4 Outline The Sort Problem

17

97

Quicksort Algorithm – summary

10 40 50 60 70 80 900 1 2 3 4 5 6

concatenate

Final resultOriginal array

10 40 500 1 2

60 70 80 900 1 2 3

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 500 1 2

60 90 80 700 1 2 3

Recursively sort each part

Pivot

98

Merge Sort AlgorithmInput:

anArray – array of Comparable objectsn – sort the elements in positions 0…(n-1)

Output: a sorted array [0..(n-1)]

Idea: (1) split the given array into two arrays,(2) sort the first and next half array separately,(2) merge the two sorted array into one sorted array

Algorithm(1) sort anArray[0..n/2-1],(2) sort anArray[n/2 .. n-1],(3) merge two arrays into one sorted array

99

Quick Sort AlgorithmInput:

anArray – array of Comparable objects

n – sort the elements in positions 0…(n-1)

Output: a sorted array [0..(n-1)]

Algorithm(1) randomly split the given array into two arrays such that

all objects in the first one is less than that in the 2nd.

(2) sort the first using quick sort

(3) sort the second using quick sort

(4) merge two arrays into one sorted array

100

Quicksort - Time Complexity (best)� Like mergesort, a single invocation of

quicksort on an array of size p has complexity O(p):– p comparisons = 2*p accesses– 2*p moves (copying) = 4*p accesses

� Best case: every pivot chosen byquicksort partitions the array into equal-sized parts. In this case quicksort is the same big-O complexity as mergesort –O(n*log(n))

101

Quicksort Call Graph – best case

n/2

n/4 n/4

n/2

n/4 n/4

1 1

n# of elements to sort (p)

1 1 1 11 1

102

Quicksort - Worst Case

� Worst case: the pivot chosen is the largest or smallest value in the array. Partition creates one part of size 1 (containing only the pivot), the other of size p-1.

Page 18: 3 4 Outline The Sort Problem

18

103

Quicksort Call Graph – worst case

1 n-1

1 n-2

1

1

n# of elements to sort (p)

104

Quicksort Time Complexity - Worst Case

� There are n-1 invocations of Quicksort (not counting base cases) with arrays of size

p = n, (n-1),…2

� Since each of these does O(p) accesses the total number of accesses isO(n) + O(n-1) + … + O(1) = O(n2)

� Ironically, the worst case occurs when the list is sorted (or near sorted)!

105

Comparisons and Accesses - Average Case

� The average case must be between the best case and the worst case, but since the best case is O(n log(n)) and the worst case is is O(n2), some analysis is necessary to find the answer.

� Analysis yields a complex recurrence relation.� On average, the elements are in random order

after each partition so about half should be smaller than the pivot and about half should be larger, so the average case is more like the best case.

� The average case number of comparisons turns out to be approximately: 1.386*n*log(n) - 2.846*n

� Therefore, the average case time complexity is: O(n log(n)).

106

Time Complexity of Quick Sort

� Best case O(n log(n))� Worst case O(n2)� Average case O(n log(n))� Note that the quick sort is inferior to

insertion sort and merge sort if the list is sorted, nearly sorted, or reverse sorted.

107

Quicksort - Space Complexity (1)� The memory needed for the original array and the

the stack f rame memory isn + O(depth of the call graph)

� The version we are analyzing has two temporary arrays (smal l and l ar ge) in each invocation of quicksort. If we make an extra scan of the array to determine their exact sizes, the additional memory for these is p.

� Because these are al located before the recursive calls are made and needed after calls return, the maximum total memory al located is the sum of the memory needed on a path in the call graph.

108

Quicksort - Space Complexity (2)� In the best case the maximum memory allocated is

n + O(logn)+ n + (n/2) + (n/4) + … + 2 = O(n)

� In the worst case the maximum memory allocated isn + O(n)+ n + (n-1) + (n-2) + … + 2

= O(n2)� It is possible to reduce this to O(n). This is a very

important improvement, even if time remains O(n2).� The key idea is to re-arrange the code so that smal l

and l ar ge are not used in the recursive calls or needed after the recursive calls.

Page 19: 3 4 Outline The Sort Problem

19

109

Quicksort Algorithm – version 1� Partition anAr r ay into smal l and l ar ge � quicksort(smal l )� quicksort(l ar ge)� Copy smal l into anAr r ay positions

0…( smal l s i ze- 1)� Copy l ar ge into anAr r ay positions

smal l si ze…( n- 1)

worst case space complexity: O(n2)

110

Quicksort Algorithm – version 2� Partition anAr r ay into smal l and l ar ge� Copy smal l into anAr r ay positions

0…( smal l s i ze- 1)� Copy l ar ge into anAr r ay positions

smal l si ze…( n- 1)� quicksort(anAr r ay , 0, smal l s i ze- 1)� quicksort(anAr r ay , smal l s i ze, n- 1)

worst case space complexity: O(n)

111

Quicksort – (1) Partition

50 60 40 90 10 80 700 1 2 3 4 5 6

Pivot

40 10 500 1 2

60 90 80 700 1 2 3

112

Quicksort – (2) copy back

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 50 60 90 80 700 1 2 3 4 5 6

copy into original array

40 10 500 1 2

60 90 80 700 1 2 3

113

Quicksort – (3) recursively sort front

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 50 60 90 80 700 1 2 3 4 5 6

copy into original array

40 10 500 1 2

60 90 80 700 1 2 3

10 40 50 60 90 80 700 1 2 3 4 5 6

quicksortfront part

114

Quicksort – (4) recursively sort back

50 60 40 90 10 80 700 1 2 3 4 5 6

40 10 50 60 90 80 700 1 2 3 4 5 6

copy into original array

40 10 500 1 2

60 90 80 700 1 2 3

10 40 50 60 70 80 900 1 2 3 4 5 6

quicksortback part

Page 20: 3 4 Outline The Sort Problem

20

115

Eliminating smal l and l ar ge

� In version 2 of Quicksort smal l and l ar ge are used as temporary storage in the course of re-arranging the values in anAr r ay.

� It is possible to re-arrange the values in anAr r ay so that:– pi vot is in its final position (pi vot I ndex)– All values in positions < pi vot I ndex are smaller than

pi vot

– All values in positions > pi vot I ndex are greater than pi vot

using only one temporary variable.� Still O(n) but half as much as in version 2.

116

Quicksort Algorithm – version 3� Partition anAr r ay in-place so that the pivot

is in its correct final position, pi vot I ndex, all smaller values are to its left, and all larger values are to its right.

� quicksort(anAr r ay , f i r s t , pi vot I ndex- 1)� quicksort(anAr r ay , pi vot I ndex+1, n- 1)

117

In-place Partition Algorithm (1)� Our goal is to move one element, the pivot,

to its correct final position so that all elements to the left of it are smaller than it and all elements to the right of it are larger than it.

� We will call this operation partition().� We select the left element as the pivot.

60 30 10 20 40 90 70 80 500 1 2 3 4 5 6 7 8

rl p

118

In-place Partition Algorithm (2)

� Find the rightmost element that is smaller than the pivot element.

60 30 10 20 40 90 70 80 500 1 2 3 4 5 6 7 8

� Exchange the elements and increment the left.

l

50 30 10 20 40 90 70 80 600 1 2 3 4 5 6 7 8

rl

p

p

rr

119

In-place Partition Algorithm (3)

� Find the leftmost element that is larger than the pivot element.

� Exchange the elements and decrement the right.

50 30 10 20 40 60 70 80 900 1 2 3 4 5 6 7 8

l p

50 30 10 20 40 90 70 80 600 1 2 3 4 5 6 7 8

rl

r

pl

120

In-place Partition Algorithm (4)

� Find the rightmost element that is smaller than the pivot element.

� Since the right passes the left, there is no element and the pivot is the final location.

50 30 10 20 40 60 70 80 900 1 2 3 4 5 6 7 8

r l p r

Page 21: 3 4 Outline The Sort Problem

21

121

In-place Partition - Arrays

pr i vat e st at i c i nt par t i t i on( i nt anAr r ay [ ] , i nt l ef t , i nt r i ght ) {/ / pr e: l ef t <= r i ght/ / post : dat a[ l ef t ] i s i n t he cor r ect sor t l ocat i on; i t s / / l ocat i on i s r et ur ned. Al l el ement s t o t he l ef t of t hi s l ocat i on/ / ar e smal l er t han pi vot , al l el ement s t o t he r i ght ar e l ar ger .

whi l e ( t r ue) {whi l e ( ( l ef t < r i ght ) &&( anAr r ay [ l ef t ] < anAr r ay[ r i ght ] ) )

r i ght - - ;i f ( l ef t < r i ght ) {

swap( anAr r ay, l ef t , r i ght ) ;l ef t ++;

}el se r et ur n l ef t ;whi l e ( ( l ef t < r i ght ) &&( anAr r ay [ l ef t ] < anAr r ay[ r i ght ] ) )

l ef t ++;i f ( l ef t < r i ght ) {

swap( anAr r ay, l ef t , r i ght ) ;r i ght - - ;

}el se r et ur n r i ght ;

}} code based on Bailey pg. 89