sorting algorithms. motivation example: phone book searching example: phone book searching if the...
TRANSCRIPT
Sorting AlgorithmsSorting Algorithms
MotivationMotivation
Example: Phone Book SearchingExample: Phone Book Searching If the phone book was in random order, If the phone book was in random order,
we would probably never use the we would probably never use the phone!phone! Let’s say ½ second per entryLet’s say ½ second per entry There are 70,000 households in IlamThere are 70,000 households in Ilam 35,000 seconds = 10hrs to find a phone 35,000 seconds = 10hrs to find a phone
number!number! Best time: ½ secondBest time: ½ second average time is about 5 hrsaverage time is about 5 hrs
MotivationMotivation
The phone book is sorted:The phone book is sorted: Jump directly to the letter of the Jump directly to the letter of the
alphabet we are interested in usingalphabet we are interested in using Scan quickly to find the first two letters Scan quickly to find the first two letters
that are really close to the name we are that are really close to the name we are interested ininterested in
Flip whole pages at a time if not close Flip whole pages at a time if not close enoughenough
The Big IdeaThe Big Idea Take a set of N randomly ordered pieces of data aTake a set of N randomly ordered pieces of data ajj
and rearrange data such that for all j (j >= 0 and j and rearrange data such that for all j (j >= 0 and j < N), R holds, for relational operator R:< N), R holds, for relational operator R:
aa0 0 R aR a11 R a R a22 R … a R … ajj … R a … R aN-1N-1 R a R aNN
If R is <=, we are doing an If R is <=, we are doing an ascending ascending sort – Each sort – Each consecutive item in the list is going to be larger consecutive item in the list is going to be larger than the previousthan the previous
If R is >=, we are doing a If R is >=, we are doing a descendingdescending sort – sort – Items get smaller as move down the listItems get smaller as move down the list
Queue Example: Radix Queue Example: Radix SortSort
Also called bin sort:Also called bin sort:Repeatedly shuffle data into small binsRepeatedly shuffle data into small binsCollect data from bins into new deckCollect data from bins into new deckRepeat until sortedRepeat until sorted
Appropriate method of shuffling and Appropriate method of shuffling and collecting?collecting?For integers, key is to shuffle data into bins For integers, key is to shuffle data into bins on a per digit basis, starting with the on a per digit basis, starting with the rightmost (ones digit)rightmost (ones digit)Collect in order, from bin 0 to bin 9, and Collect in order, from bin 0 to bin 9, and left to right within a binleft to right within a bin
Radix Sort: Ones DigitRadix Sort: Ones Digit
Data: 459 254 472 534 649 239 432 654 Data: 459 254 472 534 649 239 432 654 477477Bin 0 Bin 0 Bin 1Bin 1Bin 2 472 432 Bin 2 472 432 Bin 3Bin 3Bin 4 254 534 654Bin 4 254 534 654Bin 5Bin 5Bin 6Bin 6Bin 7 477Bin 7 477Bin 8Bin 8Bin 9 459 649 239Bin 9 459 649 239
After Call: 472 432 254 534 654 477 459 649 239After Call: 472 432 254 534 654 477 459 649 239
Radix Sort: Tens DigitRadix Sort: Tens Digit
Data: 472 432 254 534 654 477 459 649 Data: 472 432 254 534 654 477 459 649 239239Bin 0 Bin 0 Bin 1Bin 1Bin 2Bin 2Bin 3 432 534 239Bin 3 432 534 239Bin 4 649Bin 4 649Bin 5 254 654 459Bin 5 254 654 459Bin 6Bin 6Bin 7 472 477Bin 7 472 477Bin 8Bin 8Bin 9Bin 9
After Call: 432 534 239 649 254 654 459 472 477After Call: 432 534 239 649 254 654 459 472 477
Radix Sort: Hundreds Radix Sort: Hundreds DigitDigit
Data: 432 534 239 649 254 654 459 472 477Data: 432 534 239 649 254 654 459 472 477Bin 0 Bin 0 Bin 1Bin 1Bin 2 239 254Bin 2 239 254Bin 3 Bin 3 Bin 4 432 459 472 477Bin 4 432 459 472 477Bin 5 534Bin 5 534Bin 6 649 654Bin 6 649 654Bin 7 Bin 7 Bin 8Bin 8Bin 9Bin 9
Final Sorted Data: 239 254 432 459 472 477 Final Sorted Data: 239 254 432 459 472 477 534 649 654534 649 654
Radix Sort AlgorithmRadix Sort Algorithm
Begin with current digit as one’s digitBegin with current digit as one’s digitWhile there is still a digit on which to classifyWhile there is still a digit on which to classify{{
For each number in the master list, For each number in the master list, Add that number to the appropriate sublist Add that number to the appropriate sublist
keyed on the current digitkeyed on the current digit
For each sublist from 0 to 9For each sublist from 0 to 9For each number in the sublistFor each number in the sublist
Remove the number from the sublist Remove the number from the sublist and append to a new master listand append to a new master list
Advance the current digit one place to the left.Advance the current digit one place to the left.}}
Radix Sort and QueuesRadix Sort and Queues
Each list (the master list (all items) Each list (the master list (all items) and bins (per digit)) needs to be first and bins (per digit)) needs to be first in, first out ordered – perfect for a in, first out ordered – perfect for a queue.queue.
A Quick TangentA Quick Tangent
How fast have the sorts you’ve seen How fast have the sorts you’ve seen before worked?before worked? Bubble, Insertion, Selection: O(n^2)Bubble, Insertion, Selection: O(n^2)
We will see sorts that are better, and We will see sorts that are better, and in fact optimal for general sorting in fact optimal for general sorting algorithms:algorithms: Merge/Quicksort: O(n log n)Merge/Quicksort: O(n log n)
How fast is radix sort?How fast is radix sort?
Analysis of Radix SortAnalysis of Radix Sort
Let n be the number of items to sortLet n be the number of items to sort Outer loop control is on maximum Outer loop control is on maximum
length of input numbers in digits (Let length of input numbers in digits (Let this be d)this be d)
For every digit,For every digit, Assign each number to sort to a group (n Assign each number to sort to a group (n
operations)operations) Pull each number back into the master list Pull each number back into the master list
(n operations)(n operations) Overall running time: 2 * n * d => O(n)Overall running time: 2 * n * d => O(n)
Analysis of Radix SortAnalysis of Radix Sort
O(n log n) is optimal for general sorting O(n log n) is optimal for general sorting algorithmsalgorithms
Radix sort is O(n)? How does that work?Radix sort is O(n)? How does that work?
Radix sort is not a general sorting algorithm Radix sort is not a general sorting algorithm – It can’t sort arbitrary information – – It can’t sort arbitrary information – Rectangles objects, Automobiles objects, etc Rectangles objects, Automobiles objects, etc are no good.are no good. Can sort items that can be broken into constituent Can sort items that can be broken into constituent
pieces and whose pieces can be orderedpieces and whose pieces can be ordered Integers (digits), Strings (characters)Integers (digits), Strings (characters)
Sorting AlgorithmsSorting Algorithms
What does sorting really require?What does sorting really require? CompareCompare pieces of data at different pieces of data at different
positionspositions SwapSwap the data at those positions until the data at those positions until
order is correctorder is correct
2020 33 1818 99 55
202033 55 99 1818
Selection SortSelection Sortvoid selectionSort(int* a, int size)void selectionSort(int* a, int size){{
for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{
int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);
}}}}
int minimumIndex(int* a, int first, int last)int minimumIndex(int* a, int first, int last){{
int minIndex = first;int minIndex = first;for (int j = first + 1; j < last; j++)for (int j = first + 1; j < last; j++){ if (a[j] < a[minIndex]) minIndex = j; }{ if (a[j] < a[minIndex]) minIndex = j; }return minIndex;return minIndex;
}}
Selection SortSelection Sort
What is selection sort doing?What is selection sort doing? RepeatedlyRepeatedly
Finding smallest element by searching Finding smallest element by searching through listthrough list
Inserting at front of listInserting at front of list Moving “front of list” forward by 1Moving “front of list” forward by 1
Selection Sort Step Selection Sort Step ThroughThrough
minIndex(a, 0, 5) ? =1
swap (a[0],a[1])
2020 33 1818 99 55
202033 1818 99 55
Order FromPrevious
Find minIndex(a, 1, 5) =4
Find minIndex(a, 2, 5) = 3
202033 1818 99 55
5533 1818 99 2020
5533 1818 99 2020
5533 99 1818 2020
Find minIndex(a, 3, 5) = 3
K = 4 = size-1Done!
5533 99 1818 2020
5533 99 1818 2020
5533 99 1818 2020
Cost of Selection SortCost of Selection Sortvoid selectionSort(int* a, int size)void selectionSort(int* a, int size){{
for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{
int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);
}}}}
int minimumIndex(int* a, int first, int last)int minimumIndex(int* a, int first, int last){{
int minIndex = first;int minIndex = first;for (int j = first + 1; j < last; j++)for (int j = first + 1; j < last; j++){ if (a[j] < a[minIndex]) minIndex = j; }{ if (a[j] < a[minIndex]) minIndex = j; }return minIndex;return minIndex;
}}
Cost of Selection SortCost of Selection Sort How many times through outer loop?How many times through outer loop?
Iteration is for k = 0 to < (N-1)Iteration is for k = 0 to < (N-1) => N-1 times=> N-1 times How many comparisons in minIndex?How many comparisons in minIndex?
Depends on outer loop – Consider 5 elements:Depends on outer loop – Consider 5 elements: K = 0 j = 1,2,3,4K = 0 j = 1,2,3,4 K = 1 j = 2, 3, 4K = 1 j = 2, 3, 4 K = 2 j = 3, 4K = 2 j = 3, 4 K = 3 j = 4K = 3 j = 4
Total comparisons is equal to 4 + 3 + 2 + 1, Total comparisons is equal to 4 + 3 + 2 + 1, which is N-1 + N-2 + N-3 … + 1which is N-1 + N-2 + N-3 … + 1
What is that sum?What is that sum?
Cost of Selection SortCost of Selection Sort
(N-1) + (N-2) + (N-3) + … + 3 + 2 + 1(N-1) + (N-2) + (N-3) + … + 3 + 2 + 1
(N-1) + 1 + (N-2) + 2 + (N-3) + 3 …(N-1) + 1 + (N-2) + 2 + (N-3) + 3 …
N + N + N … => repeated addition of N N + N + N … => repeated addition of N
How many repeated additions?How many repeated additions?
There were n-1 total starting objects to add, we There were n-1 total starting objects to add, we grouped every 2 together – approximately N/2 grouped every 2 together – approximately N/2 repeated additionsrepeated additions
=> Approximately N * N/2 = O(N^2) => Approximately N * N/2 = O(N^2) comparisonscomparisons
Insertion SortInsertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{
for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{
int temp = a[k];int temp = a[k];int position = k;int position = k;
while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{
a[position] = a[position-1];a[position] = a[position-1];position--;position--;
}}a[position] = temp;a[position] = temp;
}}}}
Insertion SortInsertion Sort
List of size 1 (first element) is already List of size 1 (first element) is already sortedsorted
RepeatedlyRepeatedly Chooses new item to place in list (a[k])Chooses new item to place in list (a[k]) Starting at back of the list, if new item is less Starting at back of the list, if new item is less
than item at current position, shift current than item at current position, shift current data right by 1.data right by 1.
Repeat shifting until new item is not less than Repeat shifting until new item is not less than thing in front of it.thing in front of it.
Insert the new itemInsert the new item
33 1818
Insertion Sort Step Insertion Sort Step ThroughThrough
Single card listalready sorted
A[0] A[1] A[2] A[3] A[4]
A[0] A[1]A[2] A[3] A[4]
Move 3 leftuntil hitssomethingsmaller
202099 55
2020 33 1818 99 55
A[0] A[1]A[2] A[3] A[4]
Move 3 leftuntil hitssomethingsmaller
Now twosorted
A[0] A[1] A[2] A[3] A[4]
Move 18 leftuntil hitssomethingsmaller
1818 99 5533 2020
2020 1818 99 5533
A[0] A[1] A[2] A[3] A[4]
Move 18 leftuntil hitssomethingsmaller
Now three sorted
A[0] A[1] A[2] A[3] A[4]
Move 9 leftuntil hitssomethingsmaller
33 1818 2020 99 55
33 1818 2020 99 55
A[0] A[1] A[2] A[3] A[4]
Move 9 leftuntil hitssomethingsmaller
Now foursorted
A[0] A[1] A[2] A[3] A[4]
Move 5 leftuntil hitssomethingsmaller33 99 1818 22
00
33 99 1818 2200
55
55
A[0] A[1] A[2] A[3] A[4]
Move 5 leftuntil hitssomethingsmaller
Now allfive sorted
Done
33 99 1818 202055
Cost of Insertion SortCost of Insertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{
for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{
int temp = a[k];int temp = a[k];int position = k;int position = k;
while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{
a[position] = a[position-1];a[position] = a[position-1];position--;position--;
}}a[position] = temp;a[position] = temp;
}}}}
Cost of Insertion SortCost of Insertion Sort Outer loopOuter loop
K = 1 to < sizeK = 1 to < size 1,2,3,41,2,3,4=> N-1=> N-1 Inner loopInner loop
Worst case: Compare against all items in list Worst case: Compare against all items in list Inserting new smallest thingInserting new smallest thing
K = 1, 1 step (position = k = 1, while position > 0)K = 1, 1 step (position = k = 1, while position > 0) K = 2, 2 steps [position = 2,1]K = 2, 2 steps [position = 2,1] K = 3, 3 steps [position = 3,2,1]K = 3, 3 steps [position = 3,2,1] K = 4, 4 steps [position = 4,3,2,1]K = 4, 4 steps [position = 4,3,2,1]
Again, worst case total comparisons is equal to Again, worst case total comparisons is equal to sum of I from 1 to N-1, which is O(Nsum of I from 1 to N-1, which is O(N22))
Cost of SwapsCost of Swaps
Selection Sort:Selection Sort:void selectionSort(int* a, int size)void selectionSort(int* a, int size){{
for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{
int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);
}}}} One swap each time, for O(N) swapsOne swap each time, for O(N) swaps
Cost of SwapsCost of SwapsInsertion SortInsertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{
for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{
int temp = a[k];int temp = a[k];int position = k;int position = k;
while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{
a[position] = a[position-1];a[position] = a[position-1];position--;position--;
}}a[position] = temp;a[position] = temp;
}}}} Do a shift almost every time do compare, so O(nDo a shift almost every time do compare, so O(n22) shifts) shifts Shifts are faster than swaps (1 step vs 3 steps)Shifts are faster than swaps (1 step vs 3 steps) Are we doing few enough of them to make up the difference?Are we doing few enough of them to make up the difference?
Another Issue - MemoryAnother Issue - Memory
Space requirements for each sort?Space requirements for each sort? All of these sorts require the space to All of these sorts require the space to
hold the array - O(N) hold the array - O(N) Require temp variable for swapsRequire temp variable for swaps Require a handful of countersRequire a handful of counters
Can all be done “in place”, so Can all be done “in place”, so equivalent in terms of memory costsequivalent in terms of memory costs
Not all sorts can be done in place Not all sorts can be done in place though!though!
Which O(nWhich O(n22) Sort to Use?) Sort to Use?
Insertion sort is the winner:Insertion sort is the winner: Worst case requires all comparisonsWorst case requires all comparisons
Most cases don’t (jump out of while loop Most cases don’t (jump out of while loop early)early)
Selection use for loops, go all the way Selection use for loops, go all the way through each timethrough each time
TradeoffsTradeoffs
Given random data, when is it more Given random data, when is it more efficient to:efficient to: Just search Just search versusversus Insertion Sort and searchInsertion Sort and search
Assume Z searchesAssume Z searches
Search on random data: Z * O(n)Search on random data: Z * O(n)
Sort and binary search: O(nSort and binary search: O(n22) + Z *log) + Z *log22nn
TradeoffsTradeoffsZ * n <= nZ * n <= n22 + (Z * log + (Z * log22n)n)Z * n – Z * logZ * n – Z * log22n <= nn <= n22
Z * (n-logZ * (n-log22n) <= nn) <= n22
Z <= nZ <= n22/(n-log/(n-log22n)n)
For large n, logFor large n, log22n is dwarfed by n in (n-n is dwarfed by n in (n-loglog22n)n)
Z <= nZ <= n22/n/nZ <= n (approximately)Z <= n (approximately)
Improving SortsImproving Sorts
Better sorting algorithms rely on divide Better sorting algorithms rely on divide and conquer (recursion)and conquer (recursion) Find an efficient technique for splitting dataFind an efficient technique for splitting data Sort the splits separatelySort the splits separately Find an efficient technique for merging the Find an efficient technique for merging the
datadata
We’ll see two examples We’ll see two examples One does most of its work splittingOne does most of its work splitting One does most of its work mergingOne does most of its work merging