csc 213 – large scale programming. bucket-sort buckets, b, is array of sequence sorts...

26
LECTURE 25: BUCKET SORT & RADIX SORT CSC 213 – Large Scale Programming

Upload: jasper-goodman

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

LECTURE 25:BUCKET SORT & RADIX SORT

CSC 213 – Large Scale Programming

Page 2: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket-Sort

Buckets, B, is array of Sequence Sorts Collection, C, in two phases:

1. Remove each element v from C & add to B[v]

2. Move elements from each bucket back to C

A B C

Page 3: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket-Sort Algorithm

Algorithm bucketSort(Sequence<Integer> C)B = new Sequence<Integer>[10] // & instantiate each Sequence

// Phase 1 for each element v in C

B[v].addLast(v) // Assumes each number in C between 0 & 9endfor

// Phase 2loc = 0for each Sequence b in B

for each element v in bC.set(loc, v)loc += 1

endforendfor

return C

Page 4: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing objects

Page 5: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing

objects

Page 6: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integers only can be used

Sorting occurs without

comparing objects

Page 7: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing objects

Stable sort describes any sort of this type Preserves relative ordering of objects with

same value (BUBBLE-SORT & MERGE-SORT are other

stable sorts)

Page 8: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Extensions

Use Comparator for BUCKET-SORT Get index for v using compare(v, null)

Comparator for booleans could return 0 when v is false 1 when v is true

Comparator for US states, could return Annual per capita consumption of Jello Consumption of jello overall, in cubic feet State’s ranking by population

Page 9: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Extensions

State’s ranking by population

1 California2 Texas3 New York4 Florida5 Illinois

6Pennsylvania

7 Ohio8 Michigan9 Georgia

Page 10: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Extensions

Extended BUCKET-SORT works with many types Limited set of data needed for this to work Need way to enumerate values of the set

Page 11: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Bucket Sort Extensions

Extended BUCKET-SORT works with many types Limited set of data needed for this to work Need way to enumerate values of the set

enumerateis subtle

hint

Page 12: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

d-Tuples

Combination of d values such as (k1, k2, …, kd) ki is ith dimension of the tuple

A point (x, y, z) is 3-tuple x is 1st dimension’s value Value of 2nd dimension is y z is 3rd dimension’s value

Page 13: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Lexicographic Order

Assume a & b are both d-tuples a = (a1, a2, …, ad)

b = (b1, b2, …, bd)

Can say a < b if and only if a1 < b1 OR

a1 = b1 && (a2, …, ad) < (b2, …, bd)

Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8)

Page 14: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Lexicographic Order

Assume a & b are both d-tuples a = (a1, a2, …, ad)

b = (b1, b2, …, bd)

Can say a < b if and only if a1 < b1 OR

a1 = b1 && (a2, …, ad) < (b2, …, bd)

Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8) (1 4) (3 2) (3 4) (4 8) (7 8)

Page 15: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort

Very fast sort for data expressed as d-tuple Cheats to win; faster than sorting’s lower

bound Sort performed using d calls to bucket sort Sorts least to most important dimension of

tuple Luckily lots of data are d-tuples

String is d-tuple of char“L E T T E R S”“L I N G E R S”

Page 16: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort

Very fast sort for data expressed as d-tuple Cheats to win; faster than sorting’s lower

bound Sort performed using d calls to bucket sort Sorts least to most important dimension of

tuple Luckily lots of data are d-tuples

Digits of an int can be used for sorting, also

1 0 0 1 3 7 2 91 0 0 9 2 2 1 0

Page 17: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort For Integers

Represent int as a d-tuple of digits:621010 = 1111102 041010 =

0001002

Decimal digits needs 10 buckets to use for sorting

Ordering using their bits needs 2 buckets O(d∙n) time needed to run RADIX-SORT

d is length of longest element in input In most cases value of d is constant (d =

31 for int) Radix sort takes O(n) time, ignoring

constant

Page 18: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT100

10010

1101

0001

1110

Page 19: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT001

01110100111010001

1001

0010

1101

0001

1110

Page 20: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT 100

11101000100101110

00101110100111010001

1001

0010

1101

0001

1110

Page 21: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT 100

10001001011011110

10011101000100101110

00101110100111010001

1001

0010

1101

0001

1110

Page 22: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT 000

10010100111011110

10010001001011011110

10011101000100101110

00101110100111010001

1001

0010

1101

0001

1110

Page 23: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice Loop repeats once per digit to complete

sort

Page 24: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice

O(n) Loop repeats once per digit to complete

sort * O(1)

O(n)

Page 25: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice

O(n) Loop repeats once per digit to complete

sort * O(1)

O(log n) times (?) O(n log n)

Page 26: CSC 213 – Large Scale Programming. Bucket-Sort  Buckets, B, is array of Sequence  Sorts Collection, C, in two phases: 1. Remove each element v from

For Next Lecture

Review requirements for program #2 1st Preliminary deadline is Monday Spend time working on this: design saves

coding Reading on Graph ADT for Wednesday

Note: these have nothing to do with bar charts

What are mathematical graphs? Why are they the basis of everything in CS?