copyright © 2003-2011 curt hill sorting ordering an array
DESCRIPTION
Copyright © Curt Hill Selection Sort Basic idea Scan the entire array Find the smallest element Move to the top Remove the top from further consideration Repeat until entire array is sortedTRANSCRIPT
Copyright © 2003-2011 Curt Hill
Sorting
Ordering an array
Copyright © 2003-2011 Curt Hill
Considered Topics• Simple sort schemes
– Algorithms with some code• More complicated sort schemes• Performance considerations
– Time to sort in terms of the number of records N
– How the number of compares and moves relate to the size of N
Copyright © 2003-2011 Curt Hill
Selection Sort• Basic idea• Scan the entire array• Find the smallest element• Move to the top• Remove the top from further
consideration• Repeat until entire array is sorted
Copyright © 2003-2011 Curt Hill
How it works
83
19
14
26
Topelement
Leastelement
13
89
14
26
Sorted part of array
Unsorted part of array
Copyright © 2003-2011 Curt Hill
Codevoid sort(int ar[], int size){ int temp; for(int i=0;i<size-1;i++){ temp = i; for (int j=i+1;j<size;j++)
if(ar[temp]>ar[j]) temp = j; if(temp!=i) {
int val =ar[i]; ar[i] = ar[temp]; ar[temp] = val; } //swap } // outer for}
Copyright © 2003-2011 Curt Hill
Best and Worst Cases• This is an unusual algorithm in that
the best and worst case are almost the same
• Best case – Already sorted– No moves are needed– All the compares are still done
• Worst case– Inversely sorted – Same number of compares– N-1 moves
Copyright © 2003-2011 Curt Hill
How it performs• The first element is compared with
all the other elements– N-1 compares
• The second element is compared with remaining – N-2 compares
• Compares: (N-1)+(N-2)+…1(N-1)2/2
• Moves N-1
Copyright © 2003-2011 Curt Hill
Comparing running times• Mostly we are not concerned with
many of the little issues of this analysis– It is N-1 instead of N– There is a factor of N2/2 instead of N2
• When we have two different factors we always take the most expensive– N2 compares instead of N moves
• Thus selection sort is O(N2)
Copyright © 2003-2011 Curt Hill
Common Os• Constant time O(c) or O(1)
– Hashing is constant time• Logarithmic time O(log2N)
– Binary and tree searches• Linear time O(N)
– File scans, bad searches• N log N, O(N log2 N) – no other name
– Good sorts• N Squared O(N2)
– Bad sorts• Polynomial O(NX)
– Expensive but doable• Exponential O(eN)
– Intractable
Copyright © 2003-2011 Curt Hill
Bubble Sort• Basic idea• Start at top• Compare adjacent elements• Exchange if out of order• Repeat until a pass has no
exchanges
Copyright © 2003-2011 Curt Hill
First Pass83
19
14
26
83
19
14
26
• Small items bubble up slowly– One element per pass
• Large items sink quickly– Keep descending until they find a
larger item or hit bottom
83
19
14
26
83
1
914
26
Copyright © 2003-2011 Curt Hill
Codevoid sort (int ar[], int size){ bool swapped; do { swapped = false; for(int j = 0;j<size-1;j++) if(ar[j] < ar[j+1]){ int temp = ar[j]; ar[j] = ar[j+1]; ar[j+1] = temp; swapped = true; } // if } // do while(swapped);}
Copyright © 2003-2011 Curt Hill
How it performs• Bubble sort makes many moves but
always a short distance• It also does many redundant
compares• O(N2)• Big O notation makes this comparable
with selection – Usually much worse– Have to be creative to make a worse sort
Copyright © 2003-2011 Curt Hill
Best and Worst Cases• Best case
– Already sorted– One pass through does no exchanges
and quits• Worst case
– Inversely sorted – The smallest only moves up one– N-1 passes– The case of all elements sorted
except first element is in last slot is almost as bad
Copyright © 2003-2011 Curt Hill
Bubble Again• Consider two symmetric cases,
sorted with one exception: largest or smallest as far away as possible– One takes two passes the other N-1
• The problem is the direction of the scan– Items going in that direction move fast– Items going other direction slowly
• This suggests a fix
Copyright © 2003-2011 Curt Hill
Shaker Sort• Basic idea is same as bubble sort• Scan top to bottom in odd passes• Scan bottom to top in even passes
Copyright © 2003-2011 Curt Hill
First and Second Passes
83
19
14
26
83
19
14
26
831
9
14
2
6
First pass goestop to bottom
Second pass goesbottom to top
Copyright © 2003-2011 Curt Hill
How it performs• Insignificantly different• The worst case occurs very
infrequently• The extra work to handle them
complicates every run• O(N2)
Copyright © 2003-2011 Curt Hill
The Previous Problems• The problem with both of these is
the short distance things are moved
• They usually move in the right direction but seldom far enough
• One fix is to compare non-adjacent elements
• How?
Copyright © 2003-2011 Curt Hill
Shell Sort• Start with a gap g, where 1 g N• Do a sort pass comparing elements
separated by the gap and exchanging if needed
• Decrease the gap in each pass– Do not divide size by 2
• When the gap is one it is a bubble sort but most of the large distance moving has been done
Copyright © 2003-2011 Curt Hill
First Pass83
19
14
26 8
31
914
26
• First: 8 and 1 exchanged• Third: 14 and 2 exchanged• Fourth: 6 and 14 exchanged
Gap = 3
8
31
9
14
26
8
31
914
2
6
Copyright © 2003-2011 Curt Hill
How it performs• The analysis is extremely difficult• Empirically the O(N1.25)• This makes it better for any all but
insignificant table size than bubble or selection
• The break even point between O(N1.25) and O(N log2 N) is size=65000, however the constant factor on Shell is large so the break even point is much smaller
• Still inferior to the N log N sorts for large tables
Copyright © 2003-2011 Curt Hill
Insertion Sort• Partition the array into two pieces• The first one and all the rest• The first part of the array is
already sorted• Remove the first unsorted item• Insert into the correct location of
the sorted part
Copyright © 2003-2011 Curt Hill
How it works
83
19
14
26
Sorted part of array
Unsorted part of array
8
19
14
26
3Remove 3
8
19
14
26
3
Insert
Copyright © 2003-2011 Curt Hill
How it performs• Best case is sorted • Worst case is inversely sorted• Yet another N2
• Moves N-1
Copyright © 2003-2011 Curt Hill
Merge Sort• Merge increasingly larger sorted
runs into a single much larger run• Start with runs of 1• Merge two runs into a temporary
area• Copy it back
Copyright © 2003-2011 Curt Hill
Pass 1
83
19
14
26
Start: Each element is a run of 1
83
1
9
142
6
Run 1
Run 3
Run 2
Run 4
End of pass 1: runs of 2
Run 1Run 2Run 3Run 4Run 5Run 6Run 7
Copyright © 2003-2011 Curt Hill
Pass 2
83
1
9
142
6
Runs of 2
831
9
1426
Run 1
Run 2
Runs of 4
Run 1
Run 2
Run 3
Run 4
138
14
Copyright © 2003-2011 Curt Hill
Important points on Merge• An item in a run can never be
compared with any other element in the same run
• It generalizes to files nicely• Requires extra copy space equal to
the size of longest run• In last pass that is entire array• First of O(N log2 N) sorts• The insertion process generates
many moves
Copyright © 2003-2011 Curt Hill
Quick Sort
• A complicated but very fast sort– Usually the best of the in memory
sorts• Never compares two items twice• Always moves things in the right
direction• Usually moves them a relatively
long distance
Copyright © 2003-2011 Curt Hill
Algorithm I• The first item is called the pivot
– It will be the middle element• From the top look for an item that is
larger• From the bottom look for an item
that is smaller• The two items are respectively in
the wrong “half” of the table– Recall the pivot will be the middle item
• Exchange the two
Copyright © 2003-2011 Curt Hill
Algorithm II • When searches collide move the
pivot there• Now have three partitions:
– Lower – sort it by itself– Pivot – nothing more needs to be
done– Higher – sort it by itself
Copyright © 2003-2011 Curt Hill
Quick Sort
83
19
14
26
Start, pivot is 8start looking
83
19
14
26
83
19
142
6
1st exch2nd exch
83
1
914
2
6
Pivot exch
23
1
914
8
6
Donefound
Three partitions
Copyright © 2003-2011 Curt Hill
Performance• A pair of distinct keys are never
compared twice• The trick is partitioning the array
into two separate pieces that never interact again
• (½ N)2 + (½ N)2 < N2 – 202=400– 102+102 = 200
• O(N log2 N)
Copyright © 2003-2011 Curt Hill
More on Performance• A happy accident is that the pivot
may be placed in a CPU register– It is the only value compared to the
entire array– This makes it free and quick to access
• Notice the recursive nature of this algorithm– The array is partitioned into two
pieces– These are different sizes and the sort
is recursively invoked on them
Copyright © 2003-2011 Curt Hill
Best and Worst Case• It does better on unsorted file than
sorted– Counter-intuitive
• The worst case is the sorted or inversely sorted file– The chosen partition divides the table
into two, not three, partitions– N2 In this case
Copyright © 2003-2011 Curt Hill
Improvements 1• The worst case makes one think
about choosing a different pivot• Any searching for a pivot will
slow the average process with a search
• The case of a sorted array to be sorted is extremely unlikely– For 10 elements – 2 chances in 3628800 for it to be
already sorted
Copyright © 2003-2011 Curt Hill
Improvements 2• The partitioning scheme is
complicated enough that it does worse than simple sorts in very small arrays: 6-12 entries– Recursion to sort an table of length 3
is wasteful in memory and CPU cycles• The only real improvement is to
use a simpler sort when the partition size gets small– If the partition is small just use a
simple N2 sort
Two more thoughts• Virtual memory can disrupt sorting
when pieces of the array are paged out– True for any sort– If possible fix the pages
• Quick sort could use threads– Spawn a thread for one of the
partitions if it were of sufficient size– Would need to be large to make
thread overhead worth whileCopyright © 2003-2011 Curt Hill
Copyright © 2003-2011 Curt Hill
Heap Sort• Builds a binary tree in the array• The positions of the left and right
sub-trees are implicit rather than needing pointers
• Also O(N log2 N) sort• Rather complicated• Will not be shown
Copyright © 2003-2011 Curt Hill
Heap Sort Performance• Slowest of the O(N log2 N) sorts• Advantages:
– Does not need recursion of quicksort– Does not need extra space of
mergesort– Worst case is still O(N log2 N) unlike
other two
Summary• Several sorts with varying
performance:– N2: Selection, Bubble, Shaker,
Insertion– N1.25: Shell– N log2 N: Merge, Quick, Heap
Copyright © 2003-2011 Curt Hill