sorting and lower bounds 15-211 fundamental data structures and algorithms ananda guna january 27,...

Sorting and Lower bounds

15-211 Fundamental Data Structures and Algorithms

Ananda Guna

January 27, 2005

Sorting Comparison

We can categorize sorting algorithms into two major classes

Fast Sorts versus Slow Sorts

O (N log2 N) O (N2)

slow sorts are easy to code and sufficient when the amount of data is smallN n*n N * log(N)

10 100 33100 10,000 664

1,000 1,000,000 9,96610,000 1,000,000,000 132,877

100,000 10,000,000,000 1,660,964

Basic Sorting Algorithms

Bubble Sortfix flips until no more flips

Insertion Sort Insert a[i] to the sorted array [0…i-1]

AdvantagesSimple to implemen

Good for small data sets

DisadvantagesO(n2) algorithms

Recursive Sorting Algorithms

QuickSort

Average case – O(n log n)

Worst Case – O(n2)

Merge Sort

All cases O(n log n)

Extra Memory – O(n)

Comparison Chart

Almost sorted

In reverse order

Random order

All equal

bubble

insertion

merge

quicksort

Analysis of recursive sorting

Suppose it takes time T(N) to sort N elements.

Suppose also it takes time N to combine the two sorted arrays.

Then:

T(1) = 1

T(N) = 2T(N/2) + N, for N>1

Solving for T gives the running time for the recursive sorting algorithm.

QuickSort Example

Sort the following using qsort

Quicksort implementation

Implementation issues

Quick sort can be very fast in practice, but this depends on careful coding

Three major issues:

1. dividing the array in-place

2. picking the right pivot

3. avoiding quicksort on small arrays

2. Picking the pivot

In real life, inputs to a sorting routine are often not completely random

So, picking the first or last element to be the pivot is usually a bad choice

One common strategy is to pick the middle element

this is an OK strategy

2. Picking the pivot

A more sophisticated approach is to use random sampling

think about opinion polls

For example, the median-of-three strategy:

take the median of the first, middle, and last elements to be the pivot

3. Avoiding small arrays

While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays

For small enough arrays, a simpler method such as insertion sort works better

The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements

A complication!

What should happen if we encounter an element that is equal to the pivot?

Four possibilities:

L stops, R keeps going

R stops, L keeps going

L and R stop

L and R keep going

A complication!

What should happen if we encounter an element that is equal to the pivot?

Four possibilities:

L stops, R keeps going (right list longer)

R stops, L keeps going (left list longer)

L and R stop (lists equal)

L and R keep going (left list longer)

Quick Sort Algorithm

Partitioning Step Choose a pivot element say a = v[j] Determine its final position in the sorted array

• a > v[I] for all I < j

a < v[I] for all I > j Recursive Step

Perform above step on left array and right array An early look at quicksort code (incomplete)

void quicksort(int[] A , int left, int right) {

int I;

if (right > left) {

pivot = Pivot(A, left, right);

I = partition(A, left, right, pivot);

quicksort(A, left, I-1, pivot);

quicksort(A, I+1, right, pivot);

}

}

Quick Sort Code ctd..// Suppose that the pivot is p

// Partition(): rearrange A into 2 sublists

// S1 = { x A | x < p } and S2 = { x A | x > p }

int Partition(int[] A, int left, int right) {

if (A[left] > A[right]) swap(A[left], A[right]);

char pivot = A[left];

int i = left;

int j = right+1;

do {

do ++i; while (A[i] < pivot);

do --j; while (A[j] > pivot);

if (i < j) {

Swap(A[i], A[j]);

}

} while (i < j);

Swap(A[j], A[left]);

return j; // j is the position of the pivot after rearrangement

}

Quick Sort Analysis

Worst-case behavior

105 47 13 17 30 222 5 195

47 13 17 30 222 19 105

47 105 17 30 222 19

13

17

47 105 19 30 22219

If always pick the smallest (or largest) possible pivot

then O(n2) steps

Best-case analysis

In the best case, the pivot is always the median element.

In that case, the splits are always “down the middle”.

Hence, same behavior as mergesort.

That is, O(N log N).

Average-case analysis

Consider the quicksort tree:

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Average-case analysis

At each level of the tree, there are less than N nodes.

So, time spent at each level is O(N).

On average, how many levels?That is, what is the expected height of the

tree?

If on average there are O(log N) levels, then quicksort is O(N log N) on average.

Expected height of qsort tree

Assume that pivot is chosen randomly.

And that ½ the pivots are good, and ½ are bad.

Which elements in the list below are “good” pivots?

5 13 17 19 30 47 105 222


Assume that pivot is chosen randomly.

And that ½ the pivots are good, and ½ are bad.

When is a pivot “good”? “Bad”?

5 13 17 19 30 47 105 222

Probability of a good pivot is 0.5.

After good pivot, each partition is at most 3/4 size of original array.


So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the kth child is:

N(3/4)(3/4) … (3/4) (k times)

= N(3/4)k

But on average, only half of the pivots will be good, so

kth child has size N(3/4)k/2


But, if the kth child is a leaf, then

N(3/4)k/2 = 1

Thus, the expected height

k = 2log4/3N = O(log N)

Summary of quicksort

A fast sorting algorithm in practice.

Can be implemented in-place.

But is O(N2) in the worst case.

O(N log N) average-case performance.

Shell Sort

Shellsort

Shellsort, like bubble sort and insertion sort, is based on performing exchanges on inverted pairs.

Start by picking a decrement sequence hk, hk-1, …, h1, where h1=1 and for each hi > hi-1.

Start with hk and exchange each pair of inverted array elements that are k elements apart.

Continue with hk-1, …, h1.

Shellsort

Example with sequence 3, 1.

105 47 13 99 30 222

99 47 13 105 30 222

99 30 13 105 47 222

99 30 13 105 47 222

30 99 13 105 47 222

30 13 99 105 47 222

...

Several inverted pairs fixed in one exchange.

Shellsort characteristics

The running time for shellshort depends on the decrement sequence chosen.

hk=N/2, hk-1=hk/2:

Worst-case O(N2).

Let hk=2i-1, for largest 2i-1<N. hk-1=2i-

1-1.

Example: 15, 7, 3, 1.

Worst-case O(N3/2).

Other sequences achieve O(N4/3).

Non-Comparison based Sorting

Non-comparison-based sorting

If we can do more than just compare pairs of elements, we can sometimes sort more quickly

Two simple examples are bucket sort and radix sort

Bucket Sort

Bucket sort

In addition to comparing pairs of elements, we require these additional restrictions:

all elements are non-negative integers

all elements are less than a predetermined maximum value

Elements are usually keys paired with other data

Bucket sort

1 3 3 1 2

1 2 3

Bucket sort characteristics

Runs in O(N) time.

Easy to implement each bucket as a linked list.

Is stable:

If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.

Work area

Radix Sort

Radix sort

If your integers are in a larger range then do bucket sort on each digit

Start by sorting with the low-order digit using a STABLE bucket sort.

Then, do the next-lowest,and so on

Radix sort

Example:

0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 0

20517346

01234567

0 1 00 0 01 0 01 1 01 0 10 0 11 1 10 1 1

0 0 01 0 01 0 10 0 10 1 01 1 01 1 10 1 1

0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

Each sorting step must be stable.

Radix sort characteristics

Each sorting step can be performed via bucket sort, and is thus O(N).

If the numbers are all b bits long, then there are b sorting steps.

Hence, radix sort is O(bN).

Also, radix sort can be implemented in-place (just like quicksort).

Not just for binary numbers

Radix sort can be used for decimal numbers and alphanumeric strings.

0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2

0 3 10 3 22 5 21 2 32 2 40 1 50 1 61 6 9

0 1 50 1 61 2 32 2 40 3 10 3 22 5 21 6 9

0 1 50 1 60 3 10 3 21 2 31 6 92 2 42 5 2

Why comparison-based?

Bucket and radix sort are much faster than any comparison-based sorting algorithm

Unfortunately, we can’t always live with the restrictions imposed by these algorithms

In such cases, comparison-based sorting algorithms give us general solutions

Lower Bound for the Sorting Problem

How fast can we sort?

We have seen several sorting algorithms with O(N log N) running time.

In fact, O(N log N) is a general lower bound for the sorting algorithm.

A proof appears in Weiss.

Informally…

Upper and lower bounds

N

dg(N)T(N)

T(N) = O(f(N))T(N) = (g(N))

cf(N)

Decision tree for sorting

a<b<ca<c<bb<a<cb<c<ac<a<bc<b<a

a<b<ca<c<bc<a<b

b<a<cb<c<ac<b<a

a<b<ca<c<b

c<a<b

a<b<c a<c<b

b<a<cb<c<a

c<b<a

b<a<c b<c<a

a<b b<a

b<c c<bc<aa<c

b<c c<b a<c c<a

N! leaves.

So, tree has height log(N!).

log(N!) = (N log N).

Summary on sorting bound

If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (N log N).

A decision tree is a representation of the possible comparisons required to solve a problem.

Quickselect – finding median

World’s Fastest Sorters

Sorting competitions

There are several world-wide sorting competitionsUnix CoSort has achieved 1GB in under one

minute, on a single Alpha http://www.cosort.com

Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations http://now.cs.berkeley.edu/

Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine

sorting and lower bounds 15-211 fundamental data structures and algorithms ananda guna january 27,...

Documents

pivot element

left array

pivot quicksorta

right pivotavoiding

right list longerr

left list longerl

small arraysfor small

i jrecursive stepperform