sorting hkoi training team (advanced) 2006-01-21
TRANSCRIPT
Sorting
HKOI Training Team (Advanced)
2006-01-21
What is sorting?
Given: A list of n elements: A1,A2,…,An
Re-arrange the elements to make them follow a particular order, e.g.Ascending Order: A1 ≤ A2 ≤ … ≤ An
Descending Order: A1 ≥ A2 ≥ … ≥ An
We will talk about sorting in ascending order only
Why is sorting needed?
Some algorithms works only when data is sortede.g. binary search
Better presentation of dataOften required by problem setters, to reduce
workload in judging
Why learn Sorting Algorithms?
C++ STL already provided a sort() function
Unfortunately, no such implementation for Pascal This is a minor point, though
Why learn Sorting Algorithms?
Most importantly, OI problems does not directly ask for sorting, but its solution may be closely linked with sorting algorithms
In most cases, C++ STL sort() is useless. You still need to write your own “sort”
So… it is important to understand the idea behind each algorithm, and also their strengths and weaknesses
Some Sorting Algorithms…
Bubble Sort Insertion Sort Selection Sort Shell Sort Heap Sort Merge Sort Quick Sort Counting Sort Radix Sort
How many of them do you
know?
Bubble, Insertion, Selection…
Simple, in terms of Idea, and Implementation
Unfortunately, they are inefficientO(n2) – not good if N is large
Algorithms being taught today are far more efficient than these
Shell Sort
Named after its inventor, Donald ShellObservation: Insertion Sort is very
efficient whenn is smallwhen the list is almost sorted
Shell Sort
Divide the list into k non-contiguous segments Elements in each segments are k-elements
apart In the beginning, choose a large k so that all
segments contain a few elements (e.g. k=n/2) Sort each segment with Insertion Sort
2 1 4 7 4 8 3 6 4 774
Shell Sort
Definition: A list is said to be “k-sorted” when A[i] ≤ A[i+k] for 1 ≤ i ≤ n-k
Now the list is 5-sorted
2 1 4 4 4 8 3 6 7 7
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 4 7 4 8 3 6 4 7
Insert≥2 Insert≥1
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 4 7 4 8 3 6 4 7
Insert≥4 Insert≥7
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 4 7 4 8 3 6 4 7
Insert<4<4≥2
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 47 4 83 6 4 7
Insert<8<7≥1
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 474 83 6 4 7
Insert≥4
Shell Sort
After each pass, reduces k (e.g. by half) Although the number of elements in each
segments increased, the segments are usually mostly sorted
Sort each segments with Insertion Sort again
2 1 474 83 6 4 7
Insert<8≥7
Shell Sort
Finally, k is reduced to 1 The list look like mostly sorted Perform 1-sort, i.e. the ordinary Insertion Sort
2 1 474 83 6 4721 4 74 83 64 7
Shell Sort – Worse than Ins. Sort?
In Shell Sort, we still have to perform an Insertion Sort at last
A lot of operations are done before the final Insertion Sort
Isn’t it worse than Insertion Sort?
Shell Sort – Worse than Ins. Sort?
The final Insertion Sort is more efficient than before
All sorting operations before the final one are done efficiently
k-sorts compare far-apart elementsElements “moves” faster, reducing
amount of movement and comparison
Shell Sort – Increment Sequence
In our example, k starts with n/2, and half its value in each pass, until it reaches 1, i.e. {n/2, n/4, n/8, …, 1}
This is called the “Shell sequence”In a good Increment Sequence, all
numbers should be relatively prime to each other
Hibbard’s Sequence: {2m-1, 2m-1-1, …, 7, 3, 1}
Shell Sort – Analysis
Average Complexity: O(n1.5)Worse case of Shell Sort with Shell
Sequence: O(n2)When will it happen?
Heap Sort
In Selection Sort, we scan the entire list to search for the maximum, which takes O(n) time
Are there better way to get the maximum?
With the help of a heap, we may reduce the searching time to O(lg n)
Heap Sort – Build Heap
1. Create a Heap with the list
2 8 5 7 1 42
8 5
7 1 4
Heap Sort
2. Pick the maximum, restore the heap property
28 57 1 4
2
8
57
1 4
Heap Sort
3. Repeat step 2 until heap is empty
2 857 14
2
5
7
1
4
Heap Sort
3. Repeat step 2 until heap is empty
2 85 714
2
5
14
Heap Sort
3. Repeat step 2 until heap is empty
2 85 714
2 1
4
Heap Sort
3. Repeat step 2 until heap is empty
2 85 71 42
1
Heap Sort
3. Repeat step 2 until heap is empty
2 85 71 41
Heap Sort – Analysis
Complexity: O(n lg n)Not a stable sortDifficult to implement
Merging
Given two sorted list, merge the list to form a new sorted list
A naïve approach: Append the second list to the first list, then sort themSlow, takes O(n lg n) time
Are there any better way?
Merging
We make use of a property of sorted lists: The first element is always the minimumWhat does that imply?
An additional array is needed store temporary merged list
Pick the smallest number from the un-inserted numbers and append them to the merged list
Merging
List A
List B
1 3 7 9
2 3 6
Temp
Merge Sort
Merge sort follows the divide-and-conquer approachDivide: Divide the n-element sequence into
two (n/2)-element subsequencesConquer: Sort the two subsequences
recursivelyCombine: Merge the two sorted
subsequence to produce the answer
Merge Sort
1. Divide the list into two
2. Call Merge Sort recursively to sort the two subsequences
Merge Sort
2 8 5 7 1 485
Merge Sort
1 4 7
Merge Sort
3. Merge the list (to temporary array)
2 85 1 4 7
4. Move the elements back to the list
Merge Sort – Analysis
Complexity: O(n lg n)Stable Sort
What is a stable sort?Not an “In-place” sort
i.e. Additional memory requiredEasy to implement, no knowledge of
other data structures needed
Stable Sort
What is a stable sort?The name of a sorting algorithmA sorting algorithm that has stable
performance over all distribution of elements, i.e. Best ≈ Average ≈ Worse
A sorting algorithm that preserves the original order of duplicated keys
Stable Sort
1 3 5 3 4 2Original List a b
1 2 3 3 4 5Stable Sort a b
1 2 3 3 4 5Un-stable Sort b a
Stable Sort
Which sorting algorithms is/are stable?
Stable Un-stable
Bubble Sort
Merge Sort
Insertion Sort
Selection Sort
Shell Sort
Heap Sort
Stable Sort
In our previous example, what is the difference between 3a and 3b?
When will stable sort be more useful?Sorting recordsMultiple keys
Quick Sort
Quick Sort also uses the Divide-and-Conquer approachDivide: Divide the list into two by partitioningConquer: Sort the two list by calling Quick
Sort recursivelyCombine: Combine the two sorted list
Quick Sort – Partitioning
Given: A list and a “pivot” (usually an element in the list)
Re-arrange the elements so thatElements on the left-hand side of “pivot” are
less than the pivot, andElements on the right-hand side of the
“pivot” are greater than or equal to the pivot
Pivot< Pivot ≥ Pivot
Quick Sort – Partitioning
e.g. Take the first element as pivot
Swap all pairs of elements that meets the following criteria:The left one is greater than or equal to pivotThe right one is smaller than pivot
Swap pivot with A[hi]
4 6 7 0 9 3 9 4
Pivot lo hi≥ pivot? < pivot?< pivot?< pivot?≥ pivot? < pivot?< pivot?
Quick Sort
After partitioning:
Apply Quick Sort on both lists
4 670 93 9 4
PivotQuick Sort Quick Sort
6 7 9 94
Quick Sort – Analysis
ComplexityBest: O(n lg n)Worst: O(n2)Average: O(n lg n)
When will the worst case happen?How to avoid the worst case?In-Place SortNot a stable sort
Counting Sort
Consider the following list of numbers
5, 4, 2, 1, 4, 3, 4, 2, 5, 1, 4, 5, 3, 2, 3, 5, 5Range of numbers = [1,5]We may count the occurrence of each
number
1 2 3 4 5
2 3 3 4 5
Counting Sort (1)
With the frequency table, we can reconstruct the list in ascending order
1 2 3 4 5
2 3 3 4 5
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5
Counting Sort (1)
Can we sort records with this counting sort?
Is this sort stable?
Counting Sort (2)
An alternative way: use cumulative frequency table and a temporary array
Given the following “records”
3 2 1 2 2 3
1 2 3
Frequency Table
1 3 2
Cumulative
4 6
Counting Sort (2)
1 2 3
1 4 6
13 2 2 2 3
5320 1 4
Counting Sort – Analysis
Complexity: O(n+k), where k is the range of numbers
Not an In-place sortStable Sort (Method 2)Cannot be applied on data with wide
ranges
Radix Sort
Counting Sort requires a “frequency table”
The size of frequency table depends on the range of elements
If the range is large (e.g. 32-bit), it may be infeasible, if not impossible, to create such a table
Radix Sort
We may consider a integer as a “record of digits”, each digit is a key
Significance of keys decrease from left to right
e.g. the number 123 consists of 3 digitsLeftmost digit: 1 (Most significant)Middle digit: 2Rightmost digit: 3 (Least signficant)
Radix Sort
Now, the problem becomes a multi-key record sorting problem
Sort the records on the least significant key with a stable sort
Repeat with the 2nd least significant key, 3rd least significant key, and so on
Radix Sort
For all keys in these “records”, the range is [0,9] Narrow range
We apply Counting Sort to do the sorting here
Radix Sort
101 97 141 110 997 733
Original List
0
Radix Sort
Sort on the least significant digit
101 097 141 110 997 733
0 1
1 3
2 3
3 4
4 4
5 4
6 4
7 6
8 6
9 6
Radix Sort
Sort on the 2nd least significant digit
101 097141110 997733
0 1
1 2
2 2
3 3
4 4
5 4
6 4
7 4
8 4
9 6
Radix Sort
Lastly, the most significant digit
101 097141110 997733
0 1
1 4
2 4
3 4
4 4
5 4
6 4
7 5
8 5
9 6
Radix Sort – Analysis
Complexity: O(dn), where d is the number of digits
Not an In-place SortStable SortCan we run Radix Sort on
Real numbers?String?
Choosing Sorting Algorithms
List SizeData distributionData TypeAvailability of Additional MemoryCost of Swapping/Assignment
Choosing Sorting Algorithms
List Size If N is small, any sorting algorithms will do If N is large (e.g. ≥5000), O(n2) algorithms
may not finish its job within time limitData Distribution
If the list is mostly sorted, running QuickSort with “first pivot” is extremely painful
Insertion Sort, on the other hand, is very efficient in this situation
Choosing Sorting Algorithms
Data Type It is difficult to apply Counting Sort and
Radix Sort on real numbers or any other data types that cannot be converted to integers
Availability of Additional MemoryMerge Sort, Counting Sort, Radix Sort
require additional memory
Choosing Sorting Algorithms
Cost of Swapping/AssignmentMoving large records may be very time-
consumingSelection Sort takes at most (n-1) swap
operationsSwap pointers of records (i.e. swap the
records logically rather than physically)