# sorting and lower bounds

Post on 14-Jan-2016

41 views

Category:

## Documents

Embed Size (px)

DESCRIPTION

Sorting and Lower Bounds. 15-211 Fundamental Data Structures and Algorithms. Peter Lee February 25, 2003. Announcements. Quiz #2 available today Open until Wednesday midnight Midterm exam next Tuesday Tuesday, March 4, 2003, in class - PowerPoint PPT Presentation

TRANSCRIPT

• Sorting and Lower Bounds15-211 Fundamental Data Structures and AlgorithmsPeter LeeFebruary 25, 2003

• AnnouncementsQuiz #2 available todayOpen until Wednesday midnightMidterm exam next TuesdayTuesday, March 4, 2003, in classReview session in Thursdays classHomework #4 is outYou should finish Part 1 this week!Reading:Chapter 8

• Recap

• Nave sorting algorithmsBubble sort.Insertion sort.

• HeapsortBuild heap. O(N)DeleteMin until empty. O(Nlog N)Total worst case: O(Nlog N)

• ShellsortExample with sequence 3, 1....

• Divide-and-conquer

• Divide-and-conquer

• Analysis of recursive sortingSuppose it takes time T(N) to sort N elements.Suppose also it takes time N to combine the two sorted arrays.Then:T(1) = 1T(N) = 2T(N/2) + N, for N>1Solving for T gives the running time for the recursive sorting algorithm.

• Divide-and-Conquer TheoremTheorem: Let a, b, c 0.The recurrence relationT(1) = bT(N) = aT(N/c) + bNfor any N which is a power of chas upper-bound solutionsT(N) = O(N)if ac

• Exact solutionsIt is sometimes possible to derive closed-form solutions to recurrence relations.Several methods exist for doing this.Telescoping-sum methodRepeated-substitution method

• MergesortMergesort is the most basic recursive sorting algorithm.Divide array in halves A and B.Recursively mergesort each half.Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array.Note: Should be a careful to avoid creating of lots of result arrays.

• MergesortLLRLUse simple indexes to perform the split.

Use a single extra array to hold each intermediate result.

• Analysis of mergesortMergesort generates almost exactly the same recurrence relations shown before.T(1) = 1T(N) = 2T(N/2) + N - 1, for N>1Thus, mergesort is O(Nlog N).

• Comparison-based sortingRecall that these are all examples of comparison-based sorting algorithms: Items are stored in an array. Can be moved around in the array. Can compare any two array elements.Comparison has 3 possible outcomes:< = >

• Non-comparison-based sortingIf we can do more than just compare pairs of elements, we can sometimes sort more quicklyTwo simple examples are bucket sort and radix sort

• Bucket Sort

• Bucket sortIn addition to comparing pairs of elements, we require these additional restrictions:all elements are non-negative integersall elements are less than a predetermined maximum value

• Bucket sort13312

• Bucket sort characteristicsRuns in O(N) time.Easy to implement each bucket as a linked list.Is stable:If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.

• Radix sortAnother sorting algorithm that goes beyond comparison is radix sort.0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 02051734601234567Each sorting step must be stable.

• Radix sort characteristicsEach sorting step can be performed via bucket sort, and is thus O(N).If the numbers are all b bits long, then there are b sorting steps.Hence, radix sort is O(bN).Also, radix sort can be implemented in-place (just like quicksort).

• Not just for binary numbersRadix sort can be used for decimal numbers and alphanumeric strings.0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2

• Why comparison-based?Bucket and radix sort are much faster than any comparison-based sorting algorithmUnfortunately, we cant always live with the restrictions imposed by these algorithmsIn such cases, comparison-based sorting algorithms give us general solutions

• Back to Quick Sort

• Review: Quicksort algorithmIf array A has 1 (or 0) elements, then done.Choose a pivot element x from A.Divide A-{x} into two arrays:B = {yA | yx}C = {yA | yx}Quicksort arrays B and C.Result is B+{x}+C.

• Implementation issuesQuick sort can be very fast in practice, but this depends on careful codingThree major issues:doing quicksort in-placepicking the right pivotavoiding quicksort on small arrays

• 1. Doing quicksort in place85 24 63 50 17 31 96 4585 24 63 45 17 31 96 50

• 1. Doing quicksort in place31 24 17 45 50 85 96 63

• 2. Picking the pivotIn real life, inputs to a sorting routine are often partially sortedwhy does this happen?So, picking the first or last element to be the pivot is usually a bad choiceOne common strategy is to pick the middle elementthis is an OK strategy

• 2. Picking the pivotA more sophisticated approach is to use random samplingthink about opinion pollsFor example, the median-of-three strategy:take the median of the first, middle, and last elements to be the pivot

• 3. Avoiding small arraysWhile quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arraysFor small enough arrays, a simpler method such as insertion sort works betterThe exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements

• Putting it all together85 24 63 50 17 31 96 4585 24 63 45 17 31 96 50

• Putting it all together31 24 17 45 50 85 96 63

• A complication!What should happen if we encounter an element that is equal to the pivot?Four possibilities:L stops, R keeps goingR stops, L keeps goingL and R stopL and R keep going

• Quiz Break

• Red-green quizWhat should happen if we encounter an element that is equal to the pivot?Four possibilities:L stops, R keeps goingR stops, L keeps goingL and R stopL and R keep goingExplain why your choice is the only reasonable one

• Quick Sort Analysis

• Worst-case behavior5131719

• Best-case analysisIn the best case, the pivot is always the median element.In that case, the splits are always down the middle.Hence, same behavior as mergesort.That is, O(Nlog N).

• Average-case analysisConsider the quicksort tree:

• Average-case analysisThe time spent at each level of the tree is O(N).So, on average, how many levels?That is, what is the expected height of the tree?If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

• Expected height of qsort treeAssume that pivot is chosen randomly.When is a pivot good? Bad?Probability of a good pivot is 0.5.After good pivot, each child is at most 3/4 size of parent.

• Expected height of qsort treeSo, if we descend k levels in the tree, each time being lucky enough to pick a good pivot, the maximum size of the kth child is:N(3/4)(3/4) (3/4) (k times)= N(3/4)kBut on average, only half of the pivots will be good, soN(3/4)k/2 = 2log4/3N = O(log N)

• Summary of quicksortA fast sorting algorithm in practice.Can be implemented in-place.But is O(N2) in the worst case.O(Nlog N) average-case performance.

• Lower Bound for the Sorting Problem

• How fast can we sort?We have seen several sorting algorithms with O(Nlog N) running time.In fact, O(Nlog N) is a general lower bound for the sorting algorithm.A proof appears in Weiss.Informally

• Upper and lower boundsNdg(N)T(N)T(N) = O(f(N))T(N) = (g(N))cf(N)

• Decision tree for sortingN! leaves.So, tree has height log(N!).log(N!) = (Nlog N).

• Summary on sorting boundIf we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N).A decision tree is a representation of the possible comparisons required to solve a problem.

• External Sorting

• External sortingIn many real-world situations, the amount of data to be sorted is much more than can be stored in memorySo, it is important in some cases to use algorithms that work well when sorting data stored externallySee tomorrows recitation

• Worlds Fastest Sorters

• Sorting competitionsThere are several world-wide sorting competitionsUnix CoSort has achieved 1GB in under one minute, on a single AlphaBerkeleys NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstationsSandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine