cs 3343: analysis of algorithms lecture 14: order statistics
TRANSCRIPT
CS 3343: Analysis of Algorithms
Lecture 14: Order Statistics
Order statistics
• The ith order statistic in a set of n elements is the ith smallest element
• The minimum is thus the 1st order statistic • The maximum is the nth order statistic• The median is the n/2 order statistic• If n is even, there are 2 medians• How can we calculate order statistics?• What is the running time?
Order statistics – selection problem
• Select the ith smallest of n elements
• Naive algorithm: Sort.– Worst-case running time (n log n)
using merge sort or heapsort (not quicksort).
• We will show:– A practical randomized algorithm with (n)
expected running time– A cool algorithm of theoretical interest only
with (n) worst-case running time
Recall: Quicksort
• The function Partition gives us the rank of the pivot
• If we are lucky, k = i. done!• If not, at least get a smaller subarray to work with
– k > i: ith smallest is on the left subarray– k < i : ith smallest is on the right subarray
• Divide and conquer– If we are lucky, k close to n/2, or desired # is in smaller subarray– If unlucky, desired # is in larger subarray (possible size n-1)
x x xx x xrp q
k
Randomized divide-and-conquer algorithm
RAND-SELECT(A, p, q, i) ⊳ i th smallest of A[ p . . q] if p = q & i > 1 then error!r RAND-PARTITION(A, p, q)k r – p + 1 ⊳ k = rank(A[r])if i = k then return A[ r]if i < k
then return RAND-SELECT( A, p, r – 1, i )else return RAND-SELECT( A, r + 1, q, i – k )
A[r] A[r] A[r] A[r]rp q
k
Randomized Partition
• Randomly choose an element as pivot– Every time need to do a partition, throw a die to
decide which element to use as the pivot– Each element has 1/n probability to be selected
Rand-Partition(A, p, q){ d = random(); // draw a random number between 0 and 1 index = p + floor((q-p+1) * d); // p<=index<=q swap(A[p], A[index]); Partition(A, p, q); // now use A[p] as pivot}
Example
pivot
i = 677 1010 55 88 1111 33 22 1313
k = 4
Select the 6 – 4 = 2nd smallest recursively.
Select the i = 6th smallest:
33 22 55 77 1111 88 1010 1313
Partition:
77 1010 55 88 1111 33 22 1313
33 22 55 77 1111 88 1010 1313
1010
1010 88 1111 1313
88 1010
Complete example: select the 6th smallest element.
i = 6
k = 4
i = 6 – 4 = 2
k = 3
i = 2 < k
k = 2
i = 2 = k
Note: here we always used first element as pivot to do the partition (instead of rand-partition).
Intuition for analysis
Lucky:101log 9/10 nn
CASE 3T(n) = T(9n/10) + (n)
= (n)Unlucky:
T(n) = T(n – 1) + (n)= (n2)
arithmetic series
Worse than sorting!
(All our analyses today assume that all elements are distinct.)
Running time of randomized selection
• For upper bound, assume ith element always falls in larger side of partition
• The expected running time is an average of all cases
T(n) ≤
T(max(0, n–1)) + n if 0 : n–1 split,T(max(1, n–2)) + n if 1 : n–2 split,T(max(n–1, 0)) + n if n–1 : 0 split,
nknkTn
nTn
k
1
0)1,max(
1)(
Expectation
Substitution method
Assume: T(k) ≤ ck for all k < n
nkTn
nknkTn
nTn
nk
n
k
1
2
1
0)(
2)1,max(
1)(
nknc
nkTn
nTn
nk
n
nk
1
2
1
2
2)(
2)(
cncn
ncnncn
nn
nc
nT )4
(4
38
32)(
2
if c ≥ 4Therefore, T(n) = O(n)
Want to show T(n) = O(n). So need to prove T(n) ≤ cn for n > n0
Summary of randomized selection
• Works fast: linear expected time.• Excellent algorithm in practice.• But, the worst case is very bad: (n2).
Q. Is there an algorithm that runs in linear time in the worst case?
IDEA: Generate a good pivot recursively.
A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973].
Worst-case linear-time selection
if i = k then return xelseif i < k
then recursively SELECT the i th smallest element in the
lower partelse recursively SELECT the (i–
k)th smallest element in the upper part
SELECT(i, n)1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5
group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.
Same as RAND-SELECT
Choosing the pivot
Choosing the pivot
1. Divide the n elements into groups of 5.
Choosing the pivot
lesser
greater
1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.
Choosing the pivot
lesser
greater
1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote.
2. Recursively SELECT the median x of the n/5 group medians to be the pivot.
x
Analysis
lesser
greater
x
At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.
Analysis
lesser
greater
x
At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.
(Assume all elements are distinct.)
Analysis
lesser
greater
x
At least half the group medians are x, which is at least n/5 /2 = n/10 group medians.• Therefore, at least 3 n/10elements are x.• Similarly, at least 3 n/10elements are x.
• At least 3 n/10elements are x at most n-3 n/10elements are x
• At least 3 n/10elements are x at most n-3 n/10elements are x
• The recursive call to SELECT in Step 4 is executed recursively on at most n-3
n/10elements.
AnalysisNeed “at most” for worst-case runtime
3 n/10 3 n/10Possible position for pivot
• Use fact that a/ba/b-1
• n-3 n/10< n-3(n/10-1) 7n/10 + 3
3n/4 if n ≥ 60
• The recursive call to SELECT in Step 4 is executed recursively on at most 7n/10+3elements.
Analysis
Developing the recurrence
if i = k then return xelseif i < k
then recursively SELECT the i th smallest element in the
lower partelse recursively SELECT the (i–
k)th smallest element in the upper part
SELECT(i, n)1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.2. Recursively SELECT the median x of the n/5
group medians to be the pivot.3. Partition around the pivot x. Let k = rank(x).4.
T(n)
(n)
T(n/5)
(n)
T(7n/10+3)
nnTnTnT
3
107
51
)(
Solving the recurrence
if c ≥ 20 and n ≥ 60cn
ncncn
ncn
ncncn
nncncnT
)20/(
20/19
4/35
)3107()5()(
Assumption: T(k) ck for all k < n
if n ≥ 60
Conclusions
• Since the work at each level of recursion is basically a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root.
• In practice, this algorithm runs slowly, because the constant in front of n is large.
• The randomized algorithm is far more practical.
Exercise: Try to divide into groups of 3 or 7.Exercise: Think about an application in sorting.