7. sorting and order-statistics - university of regina

46
7. Sorting and Order-Statistics 7. Sorting and Order-Statistics 7.1 Introduction. 7.2 Sorting methods & analysis. Insertion Sort. Heapsort. Mergesort. Quicksort. Bucketsort and Radix sort. 7.3 A general lower bound for sorting 7.4 External Sorting 7.5 Order statistics Malek Mouhoub, CS340 Fall 2002 1

Upload: others

Post on 20-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7. Sorting and Order-Statistics - University of Regina

7. Sorting and Order-Statistics

7. Sorting and Order-Statistics

� 7.1 Introduction.

� 7.2 Sorting methods & analysis.

– Insertion Sort.

– Heapsort.

– Mergesort.

– Quicksort.

– Bucketsort and Radix sort.

� 7.3 A general lower bound for sorting

� 7.4 External Sorting

� 7.5 Order statistics

Malek Mouhoub, CS340 Fall 2002 1

Page 2: 7. Sorting and Order-Statistics - University of Regina

7.1 Introduction

7.1 Introduction

The sorting problem consists in the following :

Input : a sequence of n elements ���� ��� � � � � ���.

Output : a permutation ����� ��

�� � � � � ��

�� of the initial sequence,

sorted given an ordering relation � : ��� � ��� � � � � � ���.

Example :

(8,1,6,3,6,4) (1,3,4,6,6,8)Sorting Algorithm

Malek Mouhoub, CS340 Fall 2002 2

Page 3: 7. Sorting and Order-Statistics - University of Regina

7.2 Sorting methods

7.2 Sorting methods

Insertion sort : ����� in the worst case.

Heapsort : ��� ��� �� in the worst case.

Devide and Conquer algorithms :

Mergesort : ��� ��� �� but don’t sort in place.

Quicksort : ����� in the worst case but ��� ��� �� in the

average case.

When extra information are available

� Bucketsort : elements are positive integers smaller than � :

���� ��

Malek Mouhoub, CS340 Fall 2002 3

Page 4: 7. Sorting and Order-Statistics - University of Regina

Insertion Sort

Insertion Sort

� Efficient for a small number of values.

� The intuition behind this algorithm is the principle used by the

card players to sort a hand of cards (in the Bridge or Tarot).

– We generally start with an empty left hand and at each time

we take a card, we try to place it at the good position by

comparing it with the other cards.

� Consists of � � � passes. For each pass � (� � � � � � �)

insertion sort ensures that the elements in position 0 through �

are in sorted order.

� Best case : presorted elements. ����

� Worst case : elements in reverse order. �����

Malek Mouhoub, CS340 Fall 2002 4

Page 5: 7. Sorting and Order-Statistics - University of Regina

Heapsort

Heapsort

��

��

1st Method

1. Build a binary heap (����).

2. Perform � deleteMin operations copy them in a second

array and then copy the array back (� ���� ).

� waste in space : an extra array is needed.

Malek Mouhoub, CS340 Fall 2002 5

Page 6: 7. Sorting and Order-Statistics - University of Regina

Heapsort

Heapsort

��

��

2nd Method

� Avoid using a second array : after each deleteMin the cell

that was last in the heap can be used to store the element that

was just deleted.

� After the last deleteMin the array will contain the elements

in decreasing order.

� We can change the ordering property (max heap) if we want the

elements in increasing order.

� ��� ����� time complexity. Why ?

Malek Mouhoub, CS340 Fall 2002 6

Page 7: 7. Sorting and Order-Statistics - University of Regina

Heapsort

97

59

26 41

53

58 31

0 1 2 3 4 5 6 7 8 9 10

97 53 59 26 41 58 31

97

59

26 41

53 58

31

0 1 2 3 4 5 6 7 8 9 10

975359 26 4158 31

First deleteMax

Malek Mouhoub, CS340 Fall 2002 7

Page 8: 7. Sorting and Order-Statistics - University of Regina

Mergesort

Mergesort

Recursive algorithm :

� If � � �, there is only one element to sort.

� Otherwise, recursively mergesort the first half and the second

half. Merge together the two sorted halves using the merging

algorithm.

� Merging two sorted lists can be done in one pass through the

input, if the output is put in a third list. At most � � �

comparisons are made.

Malek Mouhoub, CS340 Fall 2002 8

Page 9: 7. Sorting and Order-Statistics - University of Regina

Analysis of Mergesort

Analysis of Mergesort

N

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

N/2 N/2

N/4 N/4N/4 N/4log N

T(N) T(N/2)=

N N/2+ c

T(N/2) T(N/4)=

N/2 N/4+ c

T(N/4) T(N/8)=

N/4 N/8+ c

T(2) T(1)=

2 1+ c

T(N) T(1)=

N 1+ c log N

T(N) = cN log N+ N = O(N log N)

T(N) = 2T(N/2) + cN

Malek Mouhoub, CS340 Fall 2002 9

Page 10: 7. Sorting and Order-Statistics - University of Regina

The master method

The master method

The master method provides a “cookbook” method for solving reccurences

of the form� ��� � � � ����� � ����

where � � � and � � � are constants and ���� is an asymptotically

positive function.

The master theorem

1. If ���� � ������� ���� and � �, then � ��� � ������� ��.

2. If ���� � ������� ��, then � ��� � ������� � ���.

3. If ���� � ������ ���� and � �, and if ������� � ���� for

some � � then � ��� � �������.

Malek Mouhoub, CS340 Fall 2002 10

Page 11: 7. Sorting and Order-Statistics - University of Regina

Quicksort

Quicksort

� The Basic Algorithm.

� Quicksort Implementation.

� Quicksort Routines.

� Analysis of Quicksort.

Malek Mouhoub, CS340 Fall 2002 11

Page 12: 7. Sorting and Order-Statistics - University of Regina

Quicksort

The Basic Algorithm

Given an array � � � � , Quicksort works as follows :

Divide : the array � � � � is divided in two non empty subarrays

� � � � � and � � � � � � .

Conquer : the two subarrays are recursively sorted.

Malek Mouhoub, CS340 Fall 2002 12

Page 13: 7. Sorting and Order-Statistics - University of Regina

Quicksort

The Basic Algorithm

� �������� �� �

1 ������ ������� ������ �� �

2 � � ������������� �� � ������

3 � �������� �� � � ��

4 � �������� � � �� �

Malek Mouhoub, CS340 Fall 2002 13

Page 14: 7. Sorting and Order-Statistics - University of Regina

65

sele

ct p

ivot

part

ition

quic

ksor

t sm

all

quic

ksor

t lar

ge

65

1381

9243

3165

5726

750

1381

9243

3165

5726

750

1343

3157

260 13

4331

5726

081

927581

9275

pivo

t

1381

9243

3165

5726

750

Fig

ure

1:Q

uick

sort

step

sill

ustr

ated

byex

ampl

e

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0214

Page 15: 7. Sorting and Order-Statistics - University of Regina

Quicksort Implementation

Quicksort Implementation

� Picking the Pivot.

� Partitioning Strategy.

Malek Mouhoub, CS340 Fall 2002 15

Page 16: 7. Sorting and Order-Statistics - University of Regina

Quicksort Implementation

Picking the Pivot

� A wrong way : choose the first element as the pivot.

� A safe maneuver : choose the pivot randomly.

� Median-of-Three Partitioning.

Example :

8 1 4 9 6 9 5 2 7 0

The pivot is 6.

Malek Mouhoub, CS340 Fall 2002 16

Page 17: 7. Sorting and Order-Statistics - University of Regina

Par

titio

ning

Str

ateg

y

81

49

03

52

7

A[p

... r

]

6

ij

81

49

03

52

76

ij

21

49

03

58

76

ij

21

45

03

98

76

ji

21

45

03

68

79

ji

1st s

tep

1st s

wap

2nd

swap

Las

t sw

ap

Res

ult

pivo

t

A[p

... i

-1]

A[i

+1

... r

]

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0217

Page 18: 7. Sorting and Order-Statistics - University of Regina

Quicksort Implementation

Quicksort Routines

� Use the median of three partitioning.

� Cutoff using insertionsort for small subarrays (N=10).

Malek Mouhoub, CS340 Fall 2002 18

Page 19: 7. Sorting and Order-Statistics - University of Regina

Quicksort Implementation

template �class Comp�

const Comp & median3(vector �Comp� &a, int left, int right)

� int center = (left+right)/2;

if (a[center] � a[left])

swap(a[left], a[center]);

if (a[right] � a[left])

swap(a[left], a[right]);

if (a[right] � a[center])

swap(a[center], a[right]);

swap(a[center], a[right � 1]); // Place pivot at position right - 1 10

return a[right � 1]; �

Malek Mouhoub, CS340 Fall 2002 19

Page 20: 7. Sorting and Order-Statistics - University of Regina

81

49

03

52

7

A[p

... r

]

6

81

49

03

52

76

1st s

wap

81

49

03

52

76

2nd

swapleft

righ

tce

nter

81

49

03

52

76

3rd

swap left

righ

tce

nter

81

49

03

52

76

Las

t sw

ap

81

49

03

52

76

Res

ult

ji

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0220

Page 21: 7. Sorting and Order-Statistics - University of Regina

tem

pla

te�

clas

sC

omp�

void

quic

ksor

t(ve

ctor�

Com

p�

&a,

int

left,

int

right

)

�/*

1*/

if(le

ft+

10�

=rig

ht)�

/*2*

/C

omp

pivo

t=

med

ian3

(a,

left,

right

);

/*3*

/in

ti=

left,

j=rig

ht�

1;

/*4*

/fo

r(;

;)�

/*5*

/w

hile

(a[+

+i]�

pivo

t)��

/*6*

/w

hile

(piv

ot�

a[��

j])��

/*7*

/if

(i�j)

/*8

*/sw

ap(a

[i],

a[j])

;10

else

/*9

*/br

eak

; �

/*10

*/sw

ap(a

[i],

a[rig

ht�

1]);

//R

esto

repi

vot

/*11

*/qu

icks

ort(

a,le

ft,i �

1);

//S

ort

smal

lel

emen

ts

/*12

*/qu

icks

ort(

a,i+

1,rig

ht);�

//S

ort

larg

eel

emen

ts

else

//D

oan

inse

rtio

nso

rton

the

suba

rray

/*13

*/in

sert

ionS

ort(

a,le

ft,rig

ht);

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0221

Page 22: 7. Sorting and Order-Statistics - University of Regina

Wro

ngw

ayof

codi

ng.

Why

?

/*3*

/in

ti=

left+

1,j=

right�

2;

/*4*

/fo

r(;

;)

/*5*

/w

hile

(a[i]�

pivo

t)i+

+;

/*6*

/w

hile

(piv

ot�

a[j])

j��

;

/*7*

/if

(i �j)

/*8

*/sw

ap(a

[i],

a[j])

;

else

10

/*9

*/br

eak

;

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0222

Page 23: 7. Sorting and Order-Statistics - University of Regina

Analysis of Quicksort

Analysis of Quicksort

pivot

T(N) = T(i) + T(N-i-1) + cN

N

i N-i-1

Assumptions :

� Random pivot.

� No cutoff for small arrays.

� � ��� � � ��� � ��

Malek Mouhoub, CS340 Fall 2002 23

Page 24: 7. Sorting and Order-Statistics - University of Regina

Analysis of Quicksort

Worst-case Analysis

N

N-1

T(N) = T(N-1) + cN

N-2T(N-1) = T(N-2) + c(N-1)

T(N-2) = T(N-3) + c(N-2)

T(2) = T(1) + c(2)2

1pivot

pivot

pivot

pivot

N

T(N) = T(1) + cΣ i i=2

N

2

T(N) = 1+ c (N - 1)(N + 2)/2 = O(N )

Malek Mouhoub, CS340 Fall 2002 24

Page 25: 7. Sorting and Order-Statistics - University of Regina

Analysis of Quicksort

Best Case Analysis

N

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

N/2 N/2

N/4 N/4N/4 N/4log N

T(N) T(N/2)=

N N/2+ c

T(N/2) T(N/4)=

N/2 N/4+ c

T(N/4) T(N/8)=

N/4 N/8+ c

T(2) T(1)=

2 1+ c

T(N) T(1)=

N 1+ c log N

T(N) = cN log N+ N = O(N log N)

T(N) = 2T(N/2) + cN

Malek Mouhoub, CS340 Fall 2002 25

Page 26: 7. Sorting and Order-Statistics - University of Regina

Ave

rage

-Cas

eA

naly

sis

Ass

umpt

ions

:

�T

hepo

ssib

lesi

zes

ofth

esu

barr

ays

have

the

sam

epr

obab

ility

(1/N

whe

reN

isth

enu

mbe

rof

elem

ents

ofth

ear

ray)

.

���������������������

(1)

��������������

� �

���� ���

����

(2)

�����

� �

���� ���

�������

(3)

�������

���� ���

��������

(4)

Tore

mov

eth

esu

mm

atio

nw

ete

lesc

ope

with

one

equa

tion

:

�������������

���� ���

������������

(5)

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0226

Page 27: 7. Sorting and Order-Statistics - University of Regina

??-

??yi

elds

:

�����������������

��������������

�����

���

�������������

����

���

�������

��

��

���

������

��

������

���

���

������

���

�������

���

���

���

. . .

����

��

����

��

�� �

����

���

�����

����

���� ���

� �

����

���

������

����

�������

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0227

Page 28: 7. Sorting and Order-Statistics - University of Regina

1

9N/10N/10

N/100 9N/100

1

9N/100 81N/100

81N/1000 729N/1000

log 10/9

log N10N

NN

N

N

N

<= N

<= N

O(N log N)

Page 29: 7. Sorting and Order-Statistics - University of Regina

Bucketsort

Bucketsort

� General sorting algorithms using only comparisons require ��� ����� time

in the worst case.

� In some special cases it is possible to sort in linear time.

� If the input ��� ��� � � � � �� consists of only positive integers smaller than

� , bucket sort can be applied.

1. Keep an array called count, of size � (� buckets), which is initialized to

all 0s.

2. When �� is read, increment count[��] by 1.

3. After all the input is read, scan the count array, printing out the a

representation of the sorted list.

4. The algorithm takes ��� ��� . If � is ����, then the total is ����.

5. Useful algorithm when the input is only small integers.

Malek Mouhoub, CS340 Fall 2002 29

Page 30: 7. Sorting and Order-Statistics - University of Regina

Radix sort

Radix sort

� Input : the keys are all nonnegative integers in base 10 and having the

same number of digits.

� 2 ways to sort the keys :

– Method 1 : Sort on the most significant digit first (leftmost digit first).

The ith step of the method consists in distributing the keys into

distinct piles based on the values of the ith digit from the left.

� a variable number of piles is required.

– Method 2 : Sort on the least significant digit first. We can use 10

piles (one for each decimal digit).

� ���� in the best case but ����� in the worst case.

Malek Mouhoub, CS340 Fall 2002 30

Page 31: 7. Sorting and Order-Statistics - University of Regina

7.3 A general lower bound for sorting

7.3 A general lower bound for sorting

Prove that any algorithm for sorting that uses only comparisons

requires

� ��� ��� �� comparisons in the worst case

� Merge sort and Heap sort are optimal to within a constant

factor

� and ��� ��� �� comparisons in the average case

� quick sort is optimal on average within a constant factor

Malek Mouhoub, CS340 Fall 2002 31

Page 32: 7. Sorting and Order-Statistics - University of Regina

Decision Trees

Decision Trees

� A decision tree is an abstraction used to prove lower bounds.

� Every algorithm that sorts by using only comparisons can be

represented by a decision tree.

� The number of comparisons used by the sorting algorithm is equal to

the depth of the deepest leaf.

Lemma 1 Let T be a binary tree of depth d. Then T has at most �� leaves.

Lemma 2 A binary tree with L leaves must have depth at least ��� .

heorem 1 Any sorting algorithm that uses only comparisons between elements

requires at least ���� comparisons in the worst case.

heorem 2 Any sorting algorithm that uses only comparisons between elements

requires �� ���� comparisons.

Malek Mouhoub, CS340 Fall 2002 32

Page 33: 7. Sorting and Order-Statistics - University of Regina

7.4 External Sorting

7.4 External Sorting

� Most of the internal sorting algorithms take advantage of the

fact that memory is directly addressable

� comparing elements is done in constant number of time

units.

� This is not the case if the data is on tape or on a disk.

Malek Mouhoub, CS340 Fall 2002 33

Page 34: 7. Sorting and Order-Statistics - University of Regina

Model for external sorting

Model for external sorting

� Sort data stored on tape.

� We assume that at least 3 tape drives are available (otherwise

any sorting algorithm will require �����.

Malek Mouhoub, CS340 Fall 2002 34

Page 35: 7. Sorting and Order-Statistics - University of Regina

The simple algorithm

The simple algorithm

� Algorithm based on the merge sort principle.

� 4 tapes are used. 2 input and 2 output tapes.

� First step : read M records (M is the number of records the

main memory can hold) at a time from the input tape, sort the

records internally and write the sorted records on one of the

output tapes. Read M other records, sort them and write the

sorted records on the other tape. Repeat the process until all

records are processed.

� Each set of records is called a run .

� The algorithm will require ����������.

Malek Mouhoub, CS340 Fall 2002 35

Page 36: 7. Sorting and Order-Statistics - University of Regina

Multi-way Merge

Multi-way Merge

� Use 2k tapes. k input tapes and k output tapes.

� The algorithm will require �����������.

Malek Mouhoub, CS340 Fall 2002 36

Page 37: 7. Sorting and Order-Statistics - University of Regina

7.5 Order Statistics

7.5 Order Statistics

� The ith order statistic of a set of n elements is the ith smallest

element.

– The minimum of a set of elements is the first order statistic.

– The maximum is the nth order statistic.

– the median is the element in the middle of a sorted list of

elements.

� The selection problem consists in selecting the ith order

statistic from a set of n distinct numbers.

Malek Mouhoub, CS340 Fall 2002 37

Page 38: 7. Sorting and Order-Statistics - University of Regina

The

sele

ctio

nP

robl

em

�A

lgor

ithm

1A:

read

the

elem

ents

into

anar

ray

and

sort

them

,

retu

rnin

gth

eap

prop

riate

elem

ent.

�as

sum

ing

asi

mpl

eso

rtin

gal

gorit

hm,t

heru

nnin

gtim

eis

���

��

(���

�����

ifm

erge

sort

ofhe

apso

rtar

e

used

).

�A

lgor

ithm

1B:fi

ndth

ekt

hla

rges

tele

men

t

1.re

ad�

elem

ents

into

anar

ray

and

sort

them

.T

hesm

alle

stof

thes

eis

inth

ekt

hpo

sitio

n.

2.P

roce

ssth

ere

mai

ning

elem

ents

one

byon

e.A

san

elem

ent

arriv

es,i

tis

com

pare

dw

ithth

ekt

hel

emen

tin

the

arra

y.If

itis

larg

er,t

hen

the

kth

elem

enti

sre

mov

ed,a

ndth

ene

wel

emen

t

ispl

aced

inth

eco

rrec

tpla

ceam

ong

the

rem

aini

ng��

elem

ents

.

����

� ��

runn

ing

time.

Why

?

�If����

then

both

algo

rithm

sar

e���

��.�

iskn

own

as

the

med

ian

inth

isca

se.

�T

hefo

llow

ing

algo

rithm

sru

nin���

�����

inth

eex

trem

e

case

of����

.

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0238

Page 39: 7. Sorting and Order-Statistics - University of Regina

Alg

orith

m6A

�A

lgor

ithm

for

findi

ngth

ekt

hsm

alle

stel

emen

t

1.R

ead�

elem

ents

into

anar

ray.

2.A

pply

thebuildHeap

algo

rithm

toth

isar

ray.

3.P

erfo

rm�deleteMin

oper

atio

ns.

The

last

elem

ente

xtra

cted

from

the

heap

isth

ean

swer

.

�C

ompl

exity

:���

������

inth

ew

orst

case

.

–�����

������

���

–������

����

����

–F

orla

rge

valu

esof�

:���

����

–����

���

����

(Ide

aof

the

heap

sort

).

�B

ych

angi

ngth

ehe

ap-o

rder

prop

erty

,we

will

solv

e

the

prob

lem

offin

ding

the

kth

larg

este

lem

ent.

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0239

Page 40: 7. Sorting and Order-Statistics - University of Regina

Alg

orith

m6B

�F

ind

the

kth

larg

este

lem

ent

1.S

ame

idea

asal

gorit

hm1B

.

2.A

tany

poin

tin

time,

mai

ntai

na

set�

ofth

e�

larg

este

lem

ents

.

3.A

fter

the

first�

elem

ents

are

read

,whe

na

new

elem

enti

sre

adit

isco

mpa

red

with

the

kth

larg

este

lem

ent,

whi

chw

ede

note

by��

(��

is

the

smal

lest

elem

enti

n�

).

–If

the

new

elem

enti

sla

rger

,the

nit

repl

aces

��

in�

.

4.A

tthe

end

ofth

ein

put,

we

find

the

smal

lest

elem

enti

n�

and

retu

rnit

asth

ean

swer

.

����

���

�����������

����

inth

e

wor

stca

se.

Why

?

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0240

Page 41: 7. Sorting and Order-Statistics - University of Regina

Using quick sort for Selection

Using quick sort for Selection

� ����������� �� � ��1 ������ ������� ������ �� �

2 � � ������������� �� � ������

3 If �� � �� then

5 Else If (k>q) � ����������� � � �� � ��

6 Else return �

����� in the worst case but ���� in the average case.

Malek Mouhoub, CS340 Fall 2002 41

Page 42: 7. Sorting and Order-Statistics - University of Regina

tem

plat

e�

clas

sC

omp

int

quic

kSel

ect(

vect

or�

Com

p &

a,in

tle

ft,in

trig

ht,

int

k)

/*1*

/if

(left

+10�

=rig

ht)

/*2*

/C

omp

pivo

t=

med

ian3

(a,

left,

right

);

//B

egin

part

ition

ing

/*3*

/in

ti=

left,

j=rig

ht�

1;

/*4*

/fo

r(;

;)

�10

/*5*

/w

hile

(a[+

+i]�

pivo

t)��

/*6*

/w

hile

(piv

ot�

a[��

j])��

/*7*

/if

(i�j)

/*8

*/sw

ap(a

[i],

a[j])

;

else /*

9*/

brea

k;

/*10

*/sw

ap(a

[i],

a[rig

ht�

1]);

//R

esto

repi

vot

/*11

*/if

(k�

=i)

20/*

12*/

quic

kSel

ect(

a,le

ft,i�

1,k)

;

/*13

*/el

seif

(k

i+

1)

/*14

*/qu

ickS

elec

t(a,

i+1,

right

,k)

;

/*15

*/el

sere

turn

a[k]

else

//D

oan

inse

rtio

nso

rton

the

suba

rray

/*16

*/in

sert

ionS

ort(

a,le

ft,rig

ht);

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0242

Page 43: 7. Sorting and Order-Statistics - University of Regina

Selection in expected linear time

Selection in expected linear time

������� ���������� �� � ��

1 if p=r

2 then return �3 � � ������� ������������ �� �

4 � � � � �� �

5 If �� � ��

6 then return ������� ���������� �� �� ��

7 else return ������� ���������� � � �� � � � ��

Malek Mouhoub, CS340 Fall 2002 43

Page 44: 7. Sorting and Order-Statistics - University of Regina

Selection in average-case linear time

Selection in average-case linear time

������ ��������� produces a partition whose low side has 1 element with

probability ��� and � elements with probability ��� for � �� � � � � � �� �.

� ��� � ����� ������� � ��� �����

���� ������� � ���� � ����

� ����� �� � �� � �

�����������

� ���� �����

���

�����������

� ��� � ����

The recurrence can be solved by substitution (assuming that � ��� � �� for some constant

�) : � ��� � �� � � ��� ����

Malek Mouhoub, CS340 Fall 2002 44

Page 45: 7. Sorting and Order-Statistics - University of Regina

Sel

ectio

nin

wor

st-c

ase

linea

rtim

e

Idea

ofth

eS

elec

talg

orith

m:

Gua

rant

eea

good

split

whe

n

the

arra

yis

part

ition

ed.

1.D

ivid

eth

e�

elem

ents

ofth

ein

puta

rray

into�����

grou

psof

5el

emen

tsea

chan

dat

mos

tone

grou

pm

ade

upof

the

rem

aini

ng�

mod

5el

emen

ts.

2.F

ind

the

med

ian

ofea

chof

the�����

grou

psby

inse

rtio

n

sort

ing

the

elem

ents

ofea

chgr

oup

and

taki

ngits

mid

dle

elem

ent.

3.U

seS

elec

trec

ursi

vely

tofin

dth

em

edia

n

ofth

e�����

med

ians

foun

din

step

2.

4.P

artit

ion

the

inpu

tarr

ayar

ound

the

med

ian-

of-m

edia

ns

usin

ga

mod

ified

vers

ion

ofth

eP

artit

ion

proc

edur

e.Le

t

beth

enu

mbe

rof

elem

ents

onth

elo

wsi

deof

the

part

ition

,

soth

at��

isth

enu

mbe

rof

elem

ents

onth

ehi

ghsi

de.

5.U

seS

elec

trec

ursi

vely

tofin

dth

eith

smal

lest

elem

ento

n

the

low

side

if��

,or

the����t

hsm

alle

stel

emen

t

onth

ehi

ghsi

deif��

.

Mal

ekM

ouho

ub,C

S34

0Fa

ll20

0245

Page 46: 7. Sorting and Order-Statistics - University of Regina

Analysis of the Select algorithm

Analysis of the Select algorithm

The number of elements greater than ! is at least :

����������� � �� ����� �

� if � � �� then � ��� � ����

� if � " �� then � ������� � � ������ � �� �����

The recurrence can be solved by substitution (assuming that

� ��� � �� for some constant �) :

� ��� � �� � � ��� � ����

Malek Mouhoub, CS340 Fall 2002 46