linear sorts
DESCRIPTION
Linear Sorts. Counting sort Bucket sort Radix sort. Linear Sorts. We will study algorithms that do not depend only on comparing whole keys to be sorted. Counting sort Bucket sort Radix sort. Counting sort. Assumptions: n records Each record contains keys and data - PowerPoint PPT PresentationTRANSCRIPT
Linear Sorts
Counting sort
Bucket sort
Radix sort
Linear Sorts 2
Linear Sorts
• We will study algorithms that do not depend only on comparing whole keys to be sorted.
• Counting sort• Bucket sort• Radix sort
Linear Sorts 3
Counting sort
• Assumptions:– n records– Each record contains keys and data– All keys are in the range of 1 to k
• Space– The unsorted list is stored in A, the sorted list will
be stored in an additional array B– Uses an additional array C of size k
Linear Sorts 4
Counting sort
• Main idea: 1. For each key value i, i = 1,…,k, count the number of times the
keys occurs in the unsorted input array A.
Store results in an auxiliary array, C 2. Use these counts to compute the offset. Offseti is used to
calculate the location where the record with key value i will be
stored in the sorted output list B. The offseti value has the location where the last keyi .
• When would you use counting sort?• How much memory is needed?
Linear Sorts 5
Counting Sort
Counting-Sort( A, B, k)1. for i 1 to k2. do C[i ] 03. for j 1 to length[A]4. do C[A[ j ] ] C[A[ j ] ] + 15. for i 2 to k6. do C[i ] C[i ] +C[i -1]7. for j length[A] down 18. do B [ C[A[ j ] ] ] A[ j ] 9. C[A[ j ] ] ] C [A[ j ] ] -1Analysis:
Input: A [ 1 .. n ],A[J] {1,2, . . . , k }
Output: B [ 1 .. n ], sorted
Uses C [ 1 .. k ],auxiliary storage
Adapted from Cormen,Leiserson,Rivest
Linear Sorts 6
A 4 31 4 43
1 2 3 4 5 6
k = 4, length = 6
C
after lines 1-2
0 0 0 0
C
after lines 3-4
1 0 2 3
Counting-Sort( A, B, k)1. for i 1 to k2. do C[i ] 03. for j 1 to length[A]4. do C[A[ j ] ] C[A[ j ] ] + 15. for i 2 to k6. do C[i ] C[i ] +C[i -1]
C
after lines 5-6
1 1 3 6
Linear Sorts 7
7. for j length[A] down 18. do B [ C[A[ j ] ] ] A[ j ] 9. C[A[ j ] ] ] C [A[ j ] ] -1
A 4 31 4 43
1 2 3 4 5 6
B
1 2 3 4 5 6
C 1 1 3 6
<-1-> <- - 3 - -> <- - - 4 - ->
Linear Sorts 8
Counting sort
3 Clinton4 Smith1 Xu2 Adams3 Dunn4 Yi 2 Baum1 Fu3 Gold1 Lu1 Land
1234
0000
1234
4232
1234
(4)(3)26(9)811
1 Lu1 Land
3 Gold
1234567891011
Original list
B
C C C1234567891011
finalcounts
"offsets"
A
Sort buckets
Linear Sorts 9
Analysis:
• O(k + n) time
– What if k = O(n)
• But Sorting takes (n lg n) ????• Requires k + n extra storage.• This is a stable sort: It preserves the original order of
equal keys.• Clearly no good for sorting 32 bit values.
Linear Sorts 10
Bucket sort
• Keys are distributed uniformly in interval [0, 1)
• The records are distributed into n buckets
• The buckets are sorted using one of the well known sorts
• Finally the buckets are combined
Linear Sorts 11
Bucket sort
.78
.17
.39
.26
.72
.94
.21
.12
.23
.68
12345678910
0123456789
/
//
/
.12 .17/
.23
.68/
.72
.94/
.39/
.78/
.21
.26/
Step 1 distribute
0123456789
/
//
/
.12 .17/
.21
.68/
.72
.94/
.39/
.78/
.23
.26/
Step 2 sorted
Step3 combine
Linear Sorts 12
Analysis
• P = 1/n , probability that the key goes to bucket i.• Expected size of bucket is np = n 1/n = 1
• The expected time to sort one bucket is (1).
• Overall expected time is (n).
Linear Sorts 13
How did IBM get rich originally?
• In the early 1900's IBM produced punched card readers for census tabulation.
• Cards are 80 columns with 12 places for punches per column. Only 10 places needed for decimals.– Picture of punch card.
• Sorters had 12 bins. • Key idea: sort the least significant digit first.
Linear Sorts 14
A punched card
Linear Sorts 15
Card punching machineIBM card punching machine
Linear Sorts 16
Hollerith’s tabulating machines
• As the cards were fed through a "tabulating machine," pins passed through the positions where holes were punched completing an electrical circuit and subsequently registered a value.
• The 1880 census in the U.S. took seven years to complete
• With Hollerith's "tabulating machines" the 1890 census took the Census Bureau six weeks
Linear Sorts 17
Card sorting machine
IBM’s card sorting machine
Linear Sorts 18
Radix sort
• Main idea– Break key into “digit” representation
key = id, id-1, …, i2, i1– "digit" can be a number in any base, a character, etc
• Radix sort:for i= 1 to d sort “digit” i using a stable sort
• Analysis : (d (stable sort time)) where d is the number of “digit”s
Linear Sorts 19
Radix sort
• Which stable sort?– Since the range of values of a digit is small the
best stable sort to use is Counting Sort.– When counting sort is used the time
complexity is (d (n +k )) where k is the range of a "digit".
• When k O(n), (d n)
Linear Sorts 20
Radix sort- with decimal digits
178139326572294321910368
12345678
910321572294326178368139
910321326139368572178294
139178294321326368572910
Input list Sorted list
Linear Sorts 21
Radix sort with unstable digit sort
1713
12
1317
1713
Input listList not sorted
Since unstableand both keys equal to 1
Linear Sorts 22
Is Quicksort stable?
• Note that data is sorted by key• Since sort unstable cannot be used for radix sort
515548
123
485551
Key Data
After partitionof 0 to 2
After partitionof 1 to 2
485551
Linear Sorts 23
Is Heapsort stable?
• Note that data is sorted by key• Since sort unstable cannot be used for radix sort
5155
12 Key Data
Complete binarytree, and max heap
51
55
5551
Heap
Sorted
Afterswap
Linear Sorts 24
Example
Sort 1 million 64-bit numbers.
We could use an in place comparison sort which would run in (n lg n) in the average case. lg 1,000,000 20 passes over the data
We can treat a 64 bit number as a 4 digit, radix-216
number. So d = 4, k = 216 , n = 1,000,000
(d (n + k )) = ( 4(216 +n)). This takes 4 * 2 passes over the data.
16 bitsd3
16 bitsd2
16 bitsd1
16 bitsdo
64 bits number = d3*(216)3 + d2*(216)2+ d1 (216)1 + d0(216)0
Adapted from Cormen,Leiserson,Rivest