recursive sorting: quicksort and its complexity. sorting card players all know how to sort … first...

Recursive sorting: Quicksort and its Complexity

Sorting• Card players all know how to sort …

• First card is already sorted

• With all the rest,Scan back from the end until you find the first card larger

than the new one,Move all the lower ones up one slot insert it

Q

2

9

A

K

10

J

2

2

9

Sorting - Insertion sort

• Complexity• For each card

• Scan O(n)

• Shift up O(n)

• Insert O(1)

• Total O(n)• First card requires O(1), second O(2), …

• For n cards operations O(n2) ii=1

n

Sorting - Insertion sort

• Complexity• For each card

• Scan O(n) O(log n)

• Shift up O(n)

• Insert O(1)

• Total O(n)• First card requires O(1), second O(2), …

• For n cards operations O(n2) ii=1

n

Unchanged!Because the

shift up operationstill requires O(n)

time

Use binary search!

Insertion Sort - Implementation

• A challenge for you• The code in the notes (and on the Web) has an error• First person to email a correct version

gets up to 2 extra marks added to their final mark if that would move them up a grade!

• ie if you had x8% or x9%, it goes to (x+1)0%

• To qualify, you need to point out the error in the original, as well as supply a corrected version!

Sorting - Bubble

• From the first element• Exchange pairs if they’re out of order

• Last one must now be the largest• Repeat from the first to n-1• Stop when you have only one element to check

Bubble Sort - Analysis/* Bubble sort for integers */

#define SWAP(a,b) { int t; t=a; a=b; b=t; }

void bubble( int a[], int n ) {

int i, j;

for(i=0;i<n;i++) { /* n passes thru the array */

/* From start to the end of unsorted part */

for(j=1;j<(n-i);j++) {

/* If adjacent items out of order, swap */

if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);

}

}

}

O(1) statement




int i, j;



for(j=1;j<(n-i);j++) {


if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);

}

}

}

Inner loopn-1, n-2, n-3, … , 1 iterations

O(1) statement




int i, j;



for(j=1;j<(n-i);j++) {


if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);

}

}

}

Outer loop n iterations




int i, j;



for(j=1;j<(n-i);j++) {


if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);

}

}

}

Overall

ii=n-1

1=

n(n+1)2

= O(n2)

n outer loop iterations inner loop iteration count

Sorting - Simple

• Bubble sort • O(n2)

• Very simple code

• Insertion sort• Slightly better than bubble sort

• Fewer comparisons• Also O(n2)

• But HeapSort is O(n log n)

• Where would you use bubble or insertion sort?

Simple Sorts• Bubble Sort or Insertion Sort

• Use when n is small

• Simple code compensates for low efficiency!

n^2 and n log n

0

500

1000

1500

2000

2500

0 10 20 30 40 50 60

n

Tim

e n log n

n 2̂

Recursive Array ProgrammingRecursive Array Programming

Recursive function definitions assume that a function works for a smaller value.

With arrays, “a smaller value” means a shorter array, i.e., a subarray, contiguous elements from the original array

We’ll define a recursive function over an array by using the same function over a subarray, and a base case

Subscripts will mark the lower and upper bounds of the subarrays

Subarrays

myArray

[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]

Base case

1 through 92 through 9


5 through 9

6 through 9


9 through 9

Example: Recursively find sum of array elements, A[lo] to A[hi]

• Assume sum( ) properly returns sum of elements for a smaller array of doubles

• Then we could write:

double sum (double[ ] A, int lo, int hi) {return ( A[lo] + sum(A, lo + 1, hi) );

}

• But we’re not done; what’s the base case?

Base Case is when Subarray is Empty, hi is less than lo

double sum (double[ ] A, int lo, int hi) {if (hi < lo)

return 0.0;else

return ( A[lo] + sum(A, lo + 1, hi) );}

• Yes, we could have defined this using(hi == lo) as the base case…

Recursive Sorting Algorithms

• We can use this same idea of recursive functions over subarrays to rewrite our sorting algorithms

• Let’s see how this works for selection sort, insertion sort, and then some new sorting algorithms

More Recursive Sorting: QuicksortMore Recursive Sorting: Quicksort

• Quicksort is an O(n2) algorithm in the worst case, but its running time is usually proportional to n log2 n;it is the method of choice for many sorting jobs

• We’ll first look at the intuition behind the algorithm: take an array

V O N I C AE Rarrange it so the small values are on the left half and all the big values are on the right half:

V O NI C AE R

Quicksort IntuitionQuicksort Intuition

• Then, do it again:

VO NIC A E Rand again:

VONICA E RThe divide-and-conquer strategy will, in general, take log2n steps to move the A into position. The intuition is that, doing this for n elements, the whole algorithm will take O(n log2 n) steps.

What Quicksort is really doingWhat Quicksort is really doing

1. Partition array a into smaller elements and larger ones – smaller ones from a[0]…a[m-1] (not necessarily in order), larger ones in positions a[m+1]…a[length-1] (not necessarily in order), and the “middle” element in a[m]. So a:

5 3 11 7192712 18might be partitioned (with m=4) as: 5 3 117 192712 18

pivotsmaller than pivot larger than pivot

quicksort(), next 2 steps

2. Recursively sort a[0]…a[m-1]. Our a becomes:

53 117 192712 18

3. Recursively sort a[m+1]…a[length-1]. Our a becomes:

53 117 19271218

pivot

pivot

quicksort( )

private void quicksort(double[] a, int lo, int hi) {int m;

if (hi > lo+1) { // there are at least 3 elements// so sort recursively

m = partition(a, lo, hi);quicksort(a, lo, m-1);quicksort(a, m+1, hi);

}else // the base case…

…}

The base case…

• We have the base case in this recursion when the subarray a[lo]…a[hi] contains zero elements (lo==hi+1), one element (lo==hi), or two elements (lo==hi-1).

• With no elements, we ignore it• With one element, it’s already sorted• With two elements, we just (might) swap them:

// 0, 1, or 2 elements, so sort directlyif (hi == lo+1 && a[lo] > a[hi])

swap(a, lo, hi);

quicksort( )




}else // 0, 1, or 2 elements, so sort directly

if (hi == lo+1 && a[lo] > a[hi])swap(a, lo, hi);

}

The outside world’s version

• As usual, we overload quicksort( ) and provide a public version that “hides” the last two arguments:

public void quicksort(double[] a) {quicksort(a, 0, a.length-1);

}

Now, all we need is partition( )

int partition(double[] a, int lo, int hi) {// Choose middle element among a[lo]…a[hi],// and move other elements so that a[lo]…a[m-1]// are all less than a[m] and a[m+1]…a[hi] are// all greater than a[m]//// m is returned to the caller

…

}

Are you feeling lucky?• Now, this would work great if we knew the median value in

the array segment, and could choose it as the pivot (i.e., know which small values go left and which large values go right). But we can’t know the median value without actually sorting the array!

• So instead, we somehow pick a pivot and hope that it is near median.

• If the pivot is the worst choice (each time), the algorithm becomes O(n2). If we are roughly dividing the subarrays in half each time, we get an O(nlog2n) algorithm.

How to pick the median value

• There are many techniques for doing this. In practice, one good way of choosing the pivot is to take the median of three elements, specifically a[lo+1], a[(lo+hi)/2], and a[hi]

• We’ll choose their median using the method medianLocation( )

medianLocation( )

int medianLocation(double[] a,int i, int j, int

k) {if (a[i] <= a[j])

if (a[j] <= a[k])return j;

else if (a[i] <= a[k])return k;

else return i;else // a[j] < a[i]

if (a[i] <= a[k])return i;

else if (a[j] <= a[k])return k;

else return j;}

The partitioning process

• From the three red elements, we choose the median, a[hi]

8 1 2 6 34 9 10 5 7lo+1 hi(lo+hi)/2

• We swap the median with a[lo], and start the partitioning process on the rest:7 1 2 6 34 9 10 5 8

lo+1 hi(lo+hi)/2


• We shuffle around elements from a[lo+1] to a[hi] so that all elements less than the pivot (a[lo]) appear to all the elements greater than the pivot

• Until we are done, we have no way of knowing how many elements are less and how many elements are greater

7 1 2 6 104 5 3 8 9lo+1 hi(lo+hi)/2 m


• m is the largest subscript that contains a value less than the pivot

• We have discovered (in our example) that m = 6

• We then swap a[m] with a[lo], placing the pivot in its rightful position, in a[m], then continue to sort the left and right subarrays recursively

7 1 2 6 104 5 3 8 9lo+1 hi(lo+hi)/2 m

6 1 2 7 104 5 3 8 9lo+1 hi(lo+hi)/2 mlo

pivot

How partition( ) works

• We will use a 3-argument method:int partition(double[ ] a, int lo, int hi)

and a 4-argument method:int shuffle(double[ ] a, int lo,

int hi, double pivot)

• The 3-argument “partition” moves the pivot element into a[lo], calls the 4-argument method “shuffle” to shuffle the elements in the subarray a[lo+1]…a[hi], then swaps a[lo] into a[m] and returns m

3-argument partition( )

int partition(double[] a, int lo, int hi) {// Choose middle element among a[lo]…a[hi],// and move other elements so that a[lo]…a[m-1]// are all less than a[m] and a[m+1]…a[hi] are// all greater than a[m]//// m is returned to the callerswap(a, lo, medianLocation(a, lo+1, hi, (lo+hi)/2));int m = shuffle(a, lo+1, hi, a[lo]);swap(a, lo, m);return m;

}

How the 4-argument shuffle( ) works

• The 4-argument method does the main work, recursively calling itself on subarrays

• We of course assume shuffle( ) works on any smaller array…

• If the first element of the array is less than or equal to pivot, it’s already in the right place, just call shuffle( ) recursively on the rest:

if (a[lo] <= pivot) // a[lo] in correct half

return shuffle(a, lo+1, hi, pivot);

(this is the “current” lo, not the lo of the whole array)


• If a[lo] > pivot, then a[lo] belongs in the upper half of the subarray, and we swap it with a[hi]

• We still don’t know where the new a[lo] value should go, so we shuffle recursively on the subarray that includes a[lo] but not a[hi]

if (a[lo] <= pivot) // a[lo] in correct half

return shuffle(a, lo+1, hi, pivot);else { // a[lo] in wrong half

swap(a, lo, hi);return shuffle(a, lo, hi-1, pivot);

}


• The base case is the one element subarray, when lo==hi. Is the one element “small” or “large”?

• If it is small (less than the pivot), then it is at the middle point m (and we can swap it with the pivot)

• Otherwise, it is just above the middle point (and we want the pivot swapped with the element just below it)

4-argument shuffle( )

int shuffle(double[] a, int lo, int hi, double pivot) {if (hi == lo)

if (a[lo] < pivot)return lo;

elsereturn lo-1;

else if (a[lo] <= pivot) // a[lo] in correct halfreturn shuffle(a, lo+1, hi, pivot);

else { // a[lo] in wrong halfswap(a, lo, hi);return shuffle(a, lo, hi-1, pivot);

}}

Example of partition

• The starting array:

4 2 15 3 6lo hi

• Choose the median from among:

4 2 15 3 6lo hi(lo+hi)/2lo+1

• Swap the median with a[lo]:

3 2 15 4 6pivot


• Now, shuffle the subarray (not counting pivot); a is now the new subarray…

3 2 15 4 6lo hi

• a[lo] > pivot, so swap it with a[hi], and continue with the shuffle:

3 2 51 4 6lo hi

pivot

pivot


• a[lo] is now less than pivot, so we leave it and continue with the shuffle:

3 2 51 4 6lo hi

• Now a[lo] is greater than pivot, so we swap it with a[hi] and continue with the shuffle:

3 2 51 6 4lo hi

pivot

pivot


• a[lo] is again greater than pivot, so we swap it with a[hi] and continue with the shuffle:

3 6 51 2 4lo hipivot

• a[lo] is less than the pivot, so lo (i.e., index 2) is returned by the 4-argument shuffle( ); the 3-argument partition( ) then swaps the pivot and the middle element, also returning index 2

3 6 512 4pivot

Now we’re ready to recursively quicksort the left and right subarrays

quicksort( )




}else // 0, 1, or 2 elements, so sort directly

if (hi == lo+1 && A[lo] > A[hi])swap(a, lo, hi);

}

Performance of QuickSort

• Quicksort:

• Best case: O(nlog2n)

• Worst case: O(n2) – when the pivot is always the second-largest or second-smallest element (since medianLocation won’t let us choose the smallest or largest)

• Average case over all possible arrangements of n array elements: O(nlog2n)

Quicksort

• Efficient sorting algorithm• Discovered by C.A.R. Hoare

• Example of Divide and Conquer algorithm• Two phases

• Partition phase• Divides the work into half

• Sort phase• Conquers the halves!

Quicksort

• Partition• Choose a pivot• Find the position for the pivot so that

• all elements to the left are less• all elements to the right are greater

< pivot > pivotpivot

Quicksort

• Conquer• Apply the same algorithm to each half

< pivot > pivot

pivot< p’ p’ > p’ < p” p” > p”

Quicksort

• Implementation

quicksort( void *a, int low, int high ) { int pivot; /* Termination condition! */ if ( high > low ) { pivot = partition( a, low, high ); quicksort( a, low, pivot-1 ); quicksort( a, pivot+1, high ); } }

Divide

Conquer

Quicksort - Partition

int partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

This exampleuses int’s

to keep thingssimple!

23 12 15 38 42 18 36 29 27

low high

Any item will do as the pivot,choose the leftmost one!


int partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

Set left and right markers

23 12 15 38 42 18 36 29 27

low highpivot: 23

left right


int partition( int *a, int low, int high ) { int left, right; int pivot_item; pivot_item = a[low]; pivot = left = low; right = high;

while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

Move the markers until they cross over

23 12 15 38 42 18 36 29 27

low highpivot: 23

left right



while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

Move the left pointer whileit points to items <= pivot

23 12 15 38 42 18 36 29 27

low highpivot: 23

left right Move right similarly



while ( left < right ) { /* Move left while item < pivot */

while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */

while( a[right] >= pivot_item ) right--; if ( left < right ) SWAP(a,left,right); } /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

Swap the two itemson the wrong side of the pivot

23 12 15 38 42 18 36 29 27

low highpivot: 23

left right






left and right have swapped over,

so stop

23 12 15 18 42 38 36 29 27

low highpivot: 23

leftright






Finally, swap the pivotand right

23 12 15 18 42 38 36 29 27

low highpivot: 23

leftright






Return the positionof the pivot

18 12 15 23 42 38 36 29 27

low high

pivot: 23right

Quicksort - Conquer

pivot

18 12 15 23 42 38 36 29 27pivot: 23

Recursivelysort left half

Recursivelysort right half

Quicksort - Analysis

• Partition• Check every item once O(n)

• Conquer• Divide data in half O(log2n)

• Total• Product O(n log n)

• Quicksort is generally faster• Fewer comparisons

• Details later (and assignment 2!)

• But there’s a catch …………….

Quicksort - The truth!

• What happens if we use quicksorton data that’s already sorted(or nearly sorted)

• We’d certainly expect it to perform well!


• Sorted data

1 2 3 4 5 6 7 8 9

pivot

< pivot

?

> pivot


• Sorted data• Each partition

produces• a problem of size 0• and one of size n-1!

• Number of partitions?

1 2 3 4 5 6 7 8 9

> pivot

2 3 4 5 6 7 8 9

> pivot

pivot

pivot


• Sorted data• Each partition

produces• a problem of size 0• and one of size n-1!

• Number of partitions?• n each needing time O(n)

• Total nO(n) or O(n2)

? Quicksort is as bad as bubble or insertion sort

1 2 3 4 5 6 7 8 9

> pivot

2 3 4 5 6 7 8 9

> pivot

pivot

pivot


• Quicksort’s O(n log n) behaviour• Depends on the partitions being nearly equal there are O( log n ) of them

• On average, this will nearly be the case and quicksort is generally O(n log n)

• Can we do anything to ensure O(n log n) time?

• In general, no• But we can improve our chances!!

Quicksort - Choice of the pivot

• Any pivot will work …• Choose a different pivot …

• so that the partitions are equal• then we will see O(n log n) time

1 2 3 4 5 6 7 8 9

pivot

< pivot > pivot

Quicksort - Median-of-3 pivot

• Take 3 positions and choose the median• say … First, middle, last

median is 5 perfect division of sorted data every time! O(n log n) time

Since sorted (or nearly sorted) data is common,median-of-3 is a good strategy

• especially if you think your data may be sorted!

1 2 3 4 5 6 7 8 9

Quicksort - Random pivot

• Choose a pivot randomly• Different position for every partition On average, sorted data is divided evenly O(n log n) time

• Key requirement• Pivot choice must take O(1) time

Quicksort - Guaranteed O(n log n)?

• Never!!• Any pivot selection strategy

could lead to O(n2) time

• Here median-of-3 chooses 2 One partition of 1 and

• One partition of 7

• Next it chooses 4 One of 1 and

• One of 5

1 4 9 6 2 5 7 8 3

1 2 4 9 6 5 7 8 3

Key Points

• Sorting• Bubble, Insert

• O(n2) sorts• Simple code• May run faster for small n,

n ~10 (system dependent)• Quick Sort

• Divide and conquer• O(n log n)

Key Points

• Quick Sort• O(n log n) but ….

• Can be O(n2)

• Depends on pivot selection• Median-of-3• Random pivot

• Better but not guaranteed

Quicksort - Why bother?

• Use Heapsort instead?• Quicksort is generally faster

• Fewer comparisons and exchanges

• Some empirical data

n Quick Heap InsertComp Exch Comp Exch Comp Exch

100 712 148 2842 581 2595 899200 1682 328 9736 9736 10307 3503500 5102 919 53113 4042 62746 21083

Quicksort - Why bother?

• Reporting data• Normalisation works when you have a hypothesis to work

with!

n Quick Heap Insert nlogn n^2

Comp Exch Norm Comp Exch Norm Comp Exch Norm Norm

100 712 148 0.74 2842 581 2.91 2595 899 4.50 0.09200 1682 328 0.71 9736 1366 2.97 10307 3503 7.61 0.09500 5102 919 0.68 53113 4042 3.00 62746 21083 15.62 0.08

Divide by n log n

Divide by n2

recursive sorting: quicksort and its complexity. sorting card players all know how to sort … first...

Documents