bucket & radix sorts. efficient sorts quicksort : o(nlogn) – o(n 2 ) mergesort : o(nlogn)...

25
Bucket & Radix Sorts

Upload: dina-hubbard

Post on 04-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Bucket & Radix Sorts

Efficient Sorts

• QuickSort : O(nlogn) – O(n2)• MergeSort : O(nlogn)

Coincidence?

Comparisons

• N! possible orderings of N items– Represent as decision tree

• Decision tree to reach N! states has depth log(N!)

Comparisons

min height ≥ Log(N!)

≥ Log(1 * 2 * 3 * … * N)

≥ Log(1) + Log(2) + … +Log(N)

≥ Log(n/2) + … + Log(N)

≥ (n/2)Log(n/2)

≥ (n/2)(Logn – Log2) = (n/2)(Logn – 1)

≥ (nLogn – n)/2

≥ (nLogn)

Just drop first half

All n/2 logs are ≥ n/2

Omega– asymptotic lower bound

Efficient Sorts To Date

• QuickSort : O(nlogn) – O(n2)• MergeSort : O(nlogn)

Coincidence?• Comparison sorts can never beat nlogn

Comparison based

Sorting with Buckets

• Ever used a sorting stick?

Bucket Sort

Bucket Sort: For each item

Pick correct bucket and inert item Make new List For each bucket

Add contents to List

Return List

Bucket Sort

Bucket Sort: For each item

Pick correct bucket and inert item Make new List For each bucket

Add contents to List

Return List

O(1)

O(1)

O(1)

O(1)

Bucket Sort

Bucket Sort: For each item

Pick correct bucket and inert item Make new List For each bucket

Add contents to List

Return List

O(1)

O(1)

O(1)

O(1)

O(k) – k = num buckets

O(n) – n = num items

Bucket Sort

• Bucket Sort : – O(n + k)– Sort granularity limited by buckets• Perfect sort, k = range of values

Bucket Sort

• Bucket Sort : – O(n + k)– Sort granularity limited by buckets• Perfect sort, k = range of values

– Sort 30,000 integers perfectly• 30,000 + 4,000,000,000• VS n log n

30,000 log 30,000 ≈ 450,000

4 billion buckets to represent all ints

Bucket Sort

• Bucket Sort : – O(n + k)– Sort granularity limited by buckets• Perfect sort, k = range of values

– Efficient if k < in relation to n• Sort 4 million people in OR by Zip Code

– Bucket» n = 4,000,000 k = 1000 (less than 1000 zips in OR)» Time ~4,001,000

– NLogN» 4,000,000 * log(4,000,000) ~ 4,000,000 * 22

Sort

• Sorting a real big pile alphabetically

Sort

• Sorting a real big pile alphabetically– Sort A-Z– Set aside each pile– Sort the A's by second letter• Then B's, C's…

– Then take AA's• Sort by third letter…

Radix Sort

• Radix : Base• Radix Sort– Sort digital data– Bucket sort based on each digit successively

Radix Sort

• MSD – Most Significant Digit• MSD Radix Sort– Partition list based on

first digit

Radix Sort

• MSD – Most Significant Digit• MSD Radix Sort– Partition list based

on first digit– Recursively sort on

next digit

MSD Advantages

• May not examine all keys

• Works on variable lengths:

• Little extra space

How does it work?

• Radix Exchange– MSD radix sort– Partition like

QuickSort,but swap on misplaced digits

– Unstable

http://www.cse.hut.fi/en/research/SVG/TRAKLA2/exercises/TrueRecursiveRadixExchangeSort-25.html

LSD Radix

• LSD – Least Significant Digit• Work from smallest digit to largest– Hard with variable lengths– Stable!

How does it work?

• Iterative Radix Sort– Buckets used as counters– Goal is to find starting/ending point of each value

How does it work?

• Iterative Radix Sort– Buckets used as counters– Check each item add one to appropriate bucket– Compute cumulative totals from buckets– Place each item in temp array

• Use bucket value as index• Decrement counter as we go

http://www.cs.usfca.edu/~galles/visualization/RadixSort.html

So it wins?

• RadixSort : O(R*N) where R = num digits– Num digits is constant…• O(N)!!

So it wins?

• RadixSort : O(R*N) where R = num digits– Num digits isn't constant in general• If M distinct values and base k

R = logkM

• O(R * N) = O(logkM * N)

• Only better then nlogn for specific situations where range of distinct values known and less than logN

Radix Summary

• For specific problems, runs in linear time

• Always depends on particulars of data– No general RadixSort algorithm

• Right tool for big jobs on specific sets of data