Transcript
Page 1: Sorting and Lower Bounds

Sorting and Lower Bounds

15-211 Fundamental Data Structures and Algorithms

Peter Lee

February 25, 2003

Page 2: Sorting and Lower Bounds

Announcements

Quiz #2 available todayOpen until Wednesday midnight

Midterm exam next TuesdayTuesday, March 4, 2003, in class

Review session in Thursday’s class

Homework #4 is outYou should finish Part 1 this week!

Reading:Chapter 8

Page 3: Sorting and Lower Bounds

Recap

Page 4: Sorting and Lower Bounds

Naïve sorting algorithms Bubble sort.

24 47 13 99 105 22213 4713 24

105 47 13 99 30 222

47 105 13 99 30 222

13 47 105 99 30 222

13 47 99 105 30 222

13 30 47 99 105 222

105 47 13 99 30 222

Insertion sort.

Page 5: Sorting and Lower Bounds

Heapsort

Build heap. O(N)

DeleteMin until empty. O(Nlog N)

Total worst case: O(Nlog N)

Page 6: Sorting and Lower Bounds

Shellsort

Example with sequence 3, 1.

105 47 13 99 30 222

99 47 13 105 30 222

99 30 13 105 47 222

99 30 13 105 47 222

30 99 13 105 47 222

30 13 99 105 47 222

...

Several inverted pairs fixed in one exchange.

Page 7: Sorting and Lower Bounds

Divide-and-conquer

Page 8: Sorting and Lower Bounds

Divide-and-conquer

Page 9: Sorting and Lower Bounds

Analysis of recursive sorting

Suppose it takes time T(N) to sort N elements.

Suppose also it takes time N to combine the two sorted arrays.

Then:

T(1) = 1

T(N) = 2T(N/2) + N, for N>1

Solving for T gives the running time for the recursive sorting algorithm.

Page 10: Sorting and Lower Bounds

Divide-and-Conquer Theorem

Theorem: Let a, b, c 0.

The recurrence relationT(1) = b

T(N) = aT(N/c) + bN

for any N which is a power of c

has upper-bound solutionsT(N) = O(N) if a<c

T(N) = O(Nlog N) if a=c

T(N) = O(Nlogca) if a>c

a=2, b=1,c=2 for rec.sorting

Page 11: Sorting and Lower Bounds

Exact solutions

It is sometimes possible to derive closed-form solutions to recurrence relations.

Several methods exist for doing this.

Telescoping-sum method

Repeated-substitution method

Page 12: Sorting and Lower Bounds

Mergesort

Mergesort is the most basic recursive sorting algorithm.Divide array in halves A and B.

Recursively mergesort each half.

Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array.

Note: Should be a careful to avoid creating of lots of result arrays.

Page 13: Sorting and Lower Bounds

Mergesort

L LR L

Use simple indexes to perform the split.

Use a single extra array to hold each intermediate result.

Page 14: Sorting and Lower Bounds

Analysis of mergesort

Mergesort generates almost exactly the same recurrence relations shown before.

T(1) = 1

T(N) = 2T(N/2) + N - 1, for N>1

Thus, mergesort is O(Nlog N).

Page 15: Sorting and Lower Bounds

Comparison-based sorting

Recall that these are all examples of comparison-based sorting algorithms:

• Items are stored in an array.

• Can be moved around in the array.

• Can compare any two array elements.

Comparison has 3 possible outcomes:

< = >

Page 16: Sorting and Lower Bounds

Non-comparison-based sorting

If we can do more than just compare pairs of elements, we can sometimes sort more quickly

Two simple examples are bucket sort and radix sort

Page 17: Sorting and Lower Bounds

Bucket Sort

Page 18: Sorting and Lower Bounds

Bucket sort

In addition to comparing pairs of elements, we require these additional restrictions:

all elements are non-negative integers

all elements are less than a predetermined maximum value

Page 19: Sorting and Lower Bounds

Bucket sort

1 3 3 1 2

1 2 3

Page 20: Sorting and Lower Bounds

Bucket sort characteristics

Runs in O(N) time.

Easy to implement each bucket as a linked list.

Is stable:

If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.

Page 21: Sorting and Lower Bounds

Radix Sort

Page 22: Sorting and Lower Bounds

Radix sort

Another sorting algorithm that goes beyond comparison is radix sort.

0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 0

20517346

01234567

0 1 00 0 01 0 01 1 01 0 10 0 11 1 10 1 1

0 0 01 0 01 0 10 0 10 1 01 1 01 1 10 1 1

0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

Each sorting step must be stable.

Page 23: Sorting and Lower Bounds

Radix sort characteristics

Each sorting step can be performed via bucket sort, and is thus O(N).

If the numbers are all b bits long, then there are b sorting steps.

Hence, radix sort is O(bN).

Also, radix sort can be implemented in-place (just like quicksort).

Page 24: Sorting and Lower Bounds

Not just for binary numbers

Radix sort can be used for decimal numbers and alphanumeric strings.

0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2

0 3 10 3 22 5 21 2 32 2 40 1 50 1 61 6 9

0 1 50 1 61 2 32 2 40 3 10 3 22 5 21 6 9

0 1 50 1 60 3 10 3 21 2 31 6 92 2 42 5 2

Page 25: Sorting and Lower Bounds

Why comparison-based?

Bucket and radix sort are much faster than any comparison-based sorting algorithm

Unfortunately, we can’t always live with the restrictions imposed by these algorithms

In such cases, comparison-based sorting algorithms give us general solutions

Page 26: Sorting and Lower Bounds

Back to Quick Sort

Page 27: Sorting and Lower Bounds

Review: Quicksort algorithm

If array A has 1 (or 0) elements, then done.

Choose a pivot element x from A.

Divide A-{x} into two arrays:

B = {yA | yx}

C = {yA | yx}

Quicksort arrays B and C.

Result is B+{x}+C.

Page 28: Sorting and Lower Bounds

Implementation issues

Quick sort can be very fast in practice, but this depends on careful coding

Three major issues:

1. doing quicksort in-place

2. picking the right pivot

3. avoiding quicksort on small arrays

Page 29: Sorting and Lower Bounds

1. Doing quicksort in place

85 24 63 50 17 31 96 45

85 24 63 45 17 31 96 50

L R

85 24 63 45 17 31 96 50

L R

31 24 63 45 17 85 96 50

L R

Page 30: Sorting and Lower Bounds

1. Doing quicksort in place

31 24 63 45 17 85 96 50

L R

31 24 17 45 63 85 96 50

R L

31 24 17 45 50 85 96 63

31 24 17 45 63 85 96 50

L R

Page 31: Sorting and Lower Bounds

2. Picking the pivot

In real life, inputs to a sorting routine are often partially sorted

why does this happen?

So, picking the first or last element to be the pivot is usually a bad choice

One common strategy is to pick the middle element

this is an OK strategy

Page 32: Sorting and Lower Bounds

2. Picking the pivot

A more sophisticated approach is to use random sampling

think about opinion polls

For example, the median-of-three strategy:

take the median of the first, middle, and last elements to be the pivot

Page 33: Sorting and Lower Bounds

3. Avoiding small arrays

While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays

For small enough arrays, a simpler method such as insertion sort works better

The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements

Page 34: Sorting and Lower Bounds

Putting it all together

85 24 63 50 17 31 96 45

85 24 63 45 17 31 96 50

L R

85 24 63 45 17 31 96 50

L R

31 24 63 45 17 85 96 50

L R

Page 35: Sorting and Lower Bounds

Putting it all together

31 24 63 45 17 85 96 50

L R

31 24 17 45 63 85 96 50

R L

31 24 17 45 50 85 96 63

31 24 17 45 63 85 96 50

L R

Page 36: Sorting and Lower Bounds

A complication!

What should happen if we encounter an element that is equal to the pivot?

Four possibilities:

L stops, R keeps going

R stops, L keeps going

L and R stop

L and R keep going

Page 37: Sorting and Lower Bounds

Quiz Break

Page 38: Sorting and Lower Bounds

Red-green quiz

What should happen if we encounter an element that is equal to the pivot?

Four possibilities:

L stops, R keeps going

R stops, L keeps going

L and R stop

L and R keep going

Explain why your choice is the only reasonable one

Page 39: Sorting and Lower Bounds

Quick Sort Analysis

Page 40: Sorting and Lower Bounds

Worst-case behavior

105 47 13 17 30 222 5 195

47 13 17 30 222 19 105

47 105 17 30 222 19

13

17

47 105 19 30 22219

Page 41: Sorting and Lower Bounds

Best-case analysis

In the best case, the pivot is always the median element.

In that case, the splits are always “down the middle”.

Hence, same behavior as mergesort.

That is, O(Nlog N).

Page 42: Sorting and Lower Bounds

Average-case analysis

Consider the quicksort tree:

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Page 43: Sorting and Lower Bounds

Average-case analysis

The time spent at each level of the tree is O(N).

So, on average, how many levels?That is, what is the expected height of

the tree?

If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

Page 44: Sorting and Lower Bounds

Expected height of qsort tree

Assume that pivot is chosen randomly.

When is a pivot “good”? “Bad”?

5 13 17 19 30 47 105 222

Probability of a good pivot is 0.5.

After good pivot, each child is at most 3/4 size of parent.

Page 45: Sorting and Lower Bounds

Expected height of qsort tree

So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the kth child is:

N(3/4)(3/4) … (3/4) (k times)

= N(3/4)k

But on average, only half of the pivots will be good, so

N(3/4)k/2 = 2log4/3N = O(log N)

Page 46: Sorting and Lower Bounds

Summary of quicksort

A fast sorting algorithm in practice.

Can be implemented in-place.

But is O(N2) in the worst case.

O(Nlog N) average-case performance.

Page 47: Sorting and Lower Bounds

Lower Bound for the Sorting Problem

Page 48: Sorting and Lower Bounds

How fast can we sort?

We have seen several sorting algorithms with O(Nlog N) running time.

In fact, O(Nlog N) is a general lower bound for the sorting algorithm.

A proof appears in Weiss.

Informally…

Page 49: Sorting and Lower Bounds

Upper and lower bounds

N

dg(N)T(N)

T(N) = O(f(N))T(N) = (g(N))

cf(N)

Page 50: Sorting and Lower Bounds

Decision tree for sorting

a<b<ca<c<bb<a<cb<c<ac<a<bc<b<a

a<b<ca<c<bc<a<b

b<a<cb<c<ac<b<a

a<b<ca<c<b

c<a<b

a<b<c a<c<b

b<a<cb<c<a

c<b<a

b<a<c b<c<a

a<b b<a

b<c c<bc<aa<c

b<c c<b a<c c<a

N! leaves.

So, tree has height log(N!).

log(N!) = (Nlog N).

Page 51: Sorting and Lower Bounds

Summary on sorting bound

If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N).

A decision tree is a representation of the possible comparisons required to solve a problem.

Page 52: Sorting and Lower Bounds

External Sorting

Page 53: Sorting and Lower Bounds

External sorting

In many real-world situations, the amount of data to be sorted is much more than can be stored in memory

So, it is important in some cases to use algorithms that work well when sorting data stored externally

See tomorrow’s recitation…

Page 54: Sorting and Lower Bounds

World’s Fastest Sorters

Page 55: Sorting and Lower Bounds

Sorting competitions

There are several world-wide sorting competitions

Unix CoSort has achieved 1GB in under one minute, on a single Alpha

Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations

Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine


Top Related