7. sorting and order-statistics - university of regina
TRANSCRIPT
7. Sorting and Order-Statistics
7. Sorting and Order-Statistics
� 7.1 Introduction.
� 7.2 Sorting methods & analysis.
– Insertion Sort.
– Heapsort.
– Mergesort.
– Quicksort.
– Bucketsort and Radix sort.
� 7.3 A general lower bound for sorting
� 7.4 External Sorting
� 7.5 Order statistics
Malek Mouhoub, CS340 Fall 2002 1
7.1 Introduction
7.1 Introduction
The sorting problem consists in the following :
Input : a sequence of n elements ���� ��� � � � � ���.
Output : a permutation ����� ��
�� � � � � ��
�� of the initial sequence,
sorted given an ordering relation � : ��� � ��� � � � � � ���.
Example :
(8,1,6,3,6,4) (1,3,4,6,6,8)Sorting Algorithm
Malek Mouhoub, CS340 Fall 2002 2
7.2 Sorting methods
7.2 Sorting methods
Insertion sort : ����� in the worst case.
Heapsort : ��� ��� �� in the worst case.
Devide and Conquer algorithms :
Mergesort : ��� ��� �� but don’t sort in place.
Quicksort : ����� in the worst case but ��� ��� �� in the
average case.
When extra information are available
� Bucketsort : elements are positive integers smaller than � :
���� ��
Malek Mouhoub, CS340 Fall 2002 3
Insertion Sort
Insertion Sort
� Efficient for a small number of values.
� The intuition behind this algorithm is the principle used by the
card players to sort a hand of cards (in the Bridge or Tarot).
– We generally start with an empty left hand and at each time
we take a card, we try to place it at the good position by
comparing it with the other cards.
� Consists of � � � passes. For each pass � (� � � � � � �)
insertion sort ensures that the elements in position 0 through �
are in sorted order.
� Best case : presorted elements. ����
� Worst case : elements in reverse order. �����
Malek Mouhoub, CS340 Fall 2002 4
Heapsort
Heapsort
��
��
1st Method
1. Build a binary heap (����).
2. Perform � deleteMin operations copy them in a second
array and then copy the array back (� ���� ).
� waste in space : an extra array is needed.
Malek Mouhoub, CS340 Fall 2002 5
Heapsort
Heapsort
��
��
2nd Method
� Avoid using a second array : after each deleteMin the cell
that was last in the heap can be used to store the element that
was just deleted.
� After the last deleteMin the array will contain the elements
in decreasing order.
� We can change the ordering property (max heap) if we want the
elements in increasing order.
� ��� ����� time complexity. Why ?
Malek Mouhoub, CS340 Fall 2002 6
Heapsort
97
59
26 41
53
58 31
0 1 2 3 4 5 6 7 8 9 10
97 53 59 26 41 58 31
97
59
26 41
53 58
31
0 1 2 3 4 5 6 7 8 9 10
975359 26 4158 31
First deleteMax
Malek Mouhoub, CS340 Fall 2002 7
Mergesort
Mergesort
Recursive algorithm :
� If � � �, there is only one element to sort.
� Otherwise, recursively mergesort the first half and the second
half. Merge together the two sorted halves using the merging
algorithm.
� Merging two sorted lists can be done in one pass through the
input, if the output is put in a third list. At most � � �
comparisons are made.
Malek Mouhoub, CS340 Fall 2002 8
Analysis of Mergesort
Analysis of Mergesort
N
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
N/2 N/2
N/4 N/4N/4 N/4log N
T(N) T(N/2)=
N N/2+ c
T(N/2) T(N/4)=
N/2 N/4+ c
T(N/4) T(N/8)=
N/4 N/8+ c
T(2) T(1)=
2 1+ c
T(N) T(1)=
N 1+ c log N
T(N) = cN log N+ N = O(N log N)
T(N) = 2T(N/2) + cN
Malek Mouhoub, CS340 Fall 2002 9
The master method
The master method
The master method provides a “cookbook” method for solving reccurences
of the form� ��� � � � ����� � ����
where � � � and � � � are constants and ���� is an asymptotically
positive function.
The master theorem
1. If ���� � ������� ���� and � �, then � ��� � ������� ��.
2. If ���� � ������� ��, then � ��� � ������� � ���.
3. If ���� � ������ ���� and � �, and if ������� � ���� for
some � � then � ��� � �������.
Malek Mouhoub, CS340 Fall 2002 10
Quicksort
Quicksort
� The Basic Algorithm.
� Quicksort Implementation.
� Quicksort Routines.
� Analysis of Quicksort.
Malek Mouhoub, CS340 Fall 2002 11
Quicksort
The Basic Algorithm
Given an array � � � � , Quicksort works as follows :
Divide : the array � � � � is divided in two non empty subarrays
� � � � � and � � � � � � .
Conquer : the two subarrays are recursively sorted.
Malek Mouhoub, CS340 Fall 2002 12
Quicksort
The Basic Algorithm
� �������� �� �
1 ������ ������� ������ �� �
2 � � ������������� �� � ������
3 � �������� �� � � ��
4 � �������� � � �� �
Malek Mouhoub, CS340 Fall 2002 13
65
sele
ct p
ivot
part
ition
quic
ksor
t sm
all
quic
ksor
t lar
ge
65
1381
9243
3165
5726
750
1381
9243
3165
5726
750
1343
3157
260 13
4331
5726
081
927581
9275
pivo
t
1381
9243
3165
5726
750
Fig
ure
1:Q
uick
sort
step
sill
ustr
ated
byex
ampl
e
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0214
Quicksort Implementation
Quicksort Implementation
� Picking the Pivot.
� Partitioning Strategy.
Malek Mouhoub, CS340 Fall 2002 15
Quicksort Implementation
Picking the Pivot
� A wrong way : choose the first element as the pivot.
� A safe maneuver : choose the pivot randomly.
� Median-of-Three Partitioning.
Example :
8 1 4 9 6 9 5 2 7 0
The pivot is 6.
Malek Mouhoub, CS340 Fall 2002 16
Par
titio
ning
Str
ateg
y
81
49
03
52
7
A[p
... r
]
6
ij
81
49
03
52
76
ij
21
49
03
58
76
ij
21
45
03
98
76
ji
21
45
03
68
79
ji
1st s
tep
1st s
wap
2nd
swap
Las
t sw
ap
Res
ult
pivo
t
A[p
... i
-1]
A[i
+1
... r
]
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0217
Quicksort Implementation
Quicksort Routines
� Use the median of three partitioning.
� Cutoff using insertionsort for small subarrays (N=10).
Malek Mouhoub, CS340 Fall 2002 18
Quicksort Implementation
template �class Comp�
const Comp & median3(vector �Comp� &a, int left, int right)
� int center = (left+right)/2;
if (a[center] � a[left])
swap(a[left], a[center]);
if (a[right] � a[left])
swap(a[left], a[right]);
if (a[right] � a[center])
swap(a[center], a[right]);
swap(a[center], a[right � 1]); // Place pivot at position right - 1 10
return a[right � 1]; �
Malek Mouhoub, CS340 Fall 2002 19
81
49
03
52
7
A[p
... r
]
6
81
49
03
52
76
1st s
wap
81
49
03
52
76
2nd
swapleft
righ
tce
nter
81
49
03
52
76
3rd
swap left
righ
tce
nter
81
49
03
52
76
Las
t sw
ap
81
49
03
52
76
Res
ult
ji
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0220
tem
pla
te�
clas
sC
omp�
void
quic
ksor
t(ve
ctor�
Com
p�
&a,
int
left,
int
right
)
�/*
1*/
if(le
ft+
10�
=rig
ht)�
/*2*
/C
omp
pivo
t=
med
ian3
(a,
left,
right
);
/*3*
/in
ti=
left,
j=rig
ht�
1;
/*4*
/fo
r(;
;)�
/*5*
/w
hile
(a[+
+i]�
pivo
t)��
/*6*
/w
hile
(piv
ot�
a[��
j])��
/*7*
/if
(i�j)
/*8
*/sw
ap(a
[i],
a[j])
;10
else
/*9
*/br
eak
; �
/*10
*/sw
ap(a
[i],
a[rig
ht�
1]);
//R
esto
repi
vot
/*11
*/qu
icks
ort(
a,le
ft,i �
1);
//S
ort
smal
lel
emen
ts
/*12
*/qu
icks
ort(
a,i+
1,rig
ht);�
//S
ort
larg
eel
emen
ts
else
//D
oan
inse
rtio
nso
rton
the
suba
rray
/*13
*/in
sert
ionS
ort(
a,le
ft,rig
ht);
�
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0221
Wro
ngw
ayof
codi
ng.
Why
?
/*3*
/in
ti=
left+
1,j=
right�
2;
/*4*
/fo
r(;
;)
�
/*5*
/w
hile
(a[i]�
pivo
t)i+
+;
/*6*
/w
hile
(piv
ot�
a[j])
j��
;
/*7*
/if
(i �j)
/*8
*/sw
ap(a
[i],
a[j])
;
else
10
/*9
*/br
eak
;
�
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0222
Analysis of Quicksort
Analysis of Quicksort
pivot
T(N) = T(i) + T(N-i-1) + cN
N
i N-i-1
Assumptions :
� Random pivot.
� No cutoff for small arrays.
� � ��� � � ��� � ��
Malek Mouhoub, CS340 Fall 2002 23
Analysis of Quicksort
Worst-case Analysis
N
N-1
T(N) = T(N-1) + cN
N-2T(N-1) = T(N-2) + c(N-1)
T(N-2) = T(N-3) + c(N-2)
T(2) = T(1) + c(2)2
1pivot
pivot
pivot
pivot
N
T(N) = T(1) + cΣ i i=2
N
2
T(N) = 1+ c (N - 1)(N + 2)/2 = O(N )
Malek Mouhoub, CS340 Fall 2002 24
Analysis of Quicksort
Best Case Analysis
N
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
N/2 N/2
N/4 N/4N/4 N/4log N
T(N) T(N/2)=
N N/2+ c
T(N/2) T(N/4)=
N/2 N/4+ c
T(N/4) T(N/8)=
N/4 N/8+ c
T(2) T(1)=
2 1+ c
T(N) T(1)=
N 1+ c log N
T(N) = cN log N+ N = O(N log N)
T(N) = 2T(N/2) + cN
Malek Mouhoub, CS340 Fall 2002 25
Ave
rage
-Cas
eA
naly
sis
Ass
umpt
ions
:
�T
hepo
ssib
lesi
zes
ofth
esu
barr
ays
have
the
sam
epr
obab
ility
(1/N
whe
reN
isth
enu
mbe
rof
elem
ents
ofth
ear
ray)
.
���������������������
(1)
��������������
� �
���� ���
����
(2)
�����
� �
���� ���
�������
(3)
�������
���� ���
��������
(4)
Tore
mov
eth
esu
mm
atio
nw
ete
lesc
ope
with
one
equa
tion
:
�������������
���� ���
������������
(5)
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0226
??-
??yi
elds
:
�����������������
��������������
�����
���
�������������
����
���
�������
��
��
���
������
��
������
���
���
�
������
���
�������
���
���
���
. . .
����
��
����
��
�� �
����
���
�����
����
���� ���
� �
����
���
������
����
�������
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0227
1
9N/10N/10
N/100 9N/100
1
9N/100 81N/100
81N/1000 729N/1000
log 10/9
log N10N
NN
N
N
N
<= N
<= N
O(N log N)
Bucketsort
Bucketsort
� General sorting algorithms using only comparisons require ��� ����� time
in the worst case.
� In some special cases it is possible to sort in linear time.
� If the input ��� ��� � � � � �� consists of only positive integers smaller than
� , bucket sort can be applied.
1. Keep an array called count, of size � (� buckets), which is initialized to
all 0s.
2. When �� is read, increment count[��] by 1.
3. After all the input is read, scan the count array, printing out the a
representation of the sorted list.
4. The algorithm takes ��� ��� . If � is ����, then the total is ����.
5. Useful algorithm when the input is only small integers.
Malek Mouhoub, CS340 Fall 2002 29
Radix sort
Radix sort
� Input : the keys are all nonnegative integers in base 10 and having the
same number of digits.
� 2 ways to sort the keys :
– Method 1 : Sort on the most significant digit first (leftmost digit first).
The ith step of the method consists in distributing the keys into
distinct piles based on the values of the ith digit from the left.
� a variable number of piles is required.
– Method 2 : Sort on the least significant digit first. We can use 10
piles (one for each decimal digit).
� ���� in the best case but ����� in the worst case.
Malek Mouhoub, CS340 Fall 2002 30
7.3 A general lower bound for sorting
7.3 A general lower bound for sorting
Prove that any algorithm for sorting that uses only comparisons
requires
� ��� ��� �� comparisons in the worst case
� Merge sort and Heap sort are optimal to within a constant
factor
� and ��� ��� �� comparisons in the average case
� quick sort is optimal on average within a constant factor
Malek Mouhoub, CS340 Fall 2002 31
Decision Trees
Decision Trees
� A decision tree is an abstraction used to prove lower bounds.
� Every algorithm that sorts by using only comparisons can be
represented by a decision tree.
� The number of comparisons used by the sorting algorithm is equal to
the depth of the deepest leaf.
Lemma 1 Let T be a binary tree of depth d. Then T has at most �� leaves.
Lemma 2 A binary tree with L leaves must have depth at least ��� .
heorem 1 Any sorting algorithm that uses only comparisons between elements
requires at least ���� comparisons in the worst case.
heorem 2 Any sorting algorithm that uses only comparisons between elements
requires �� ���� comparisons.
Malek Mouhoub, CS340 Fall 2002 32
7.4 External Sorting
7.4 External Sorting
� Most of the internal sorting algorithms take advantage of the
fact that memory is directly addressable
� comparing elements is done in constant number of time
units.
� This is not the case if the data is on tape or on a disk.
Malek Mouhoub, CS340 Fall 2002 33
Model for external sorting
Model for external sorting
� Sort data stored on tape.
� We assume that at least 3 tape drives are available (otherwise
any sorting algorithm will require �����.
Malek Mouhoub, CS340 Fall 2002 34
The simple algorithm
The simple algorithm
� Algorithm based on the merge sort principle.
� 4 tapes are used. 2 input and 2 output tapes.
� First step : read M records (M is the number of records the
main memory can hold) at a time from the input tape, sort the
records internally and write the sorted records on one of the
output tapes. Read M other records, sort them and write the
sorted records on the other tape. Repeat the process until all
records are processed.
� Each set of records is called a run .
� The algorithm will require ����������.
Malek Mouhoub, CS340 Fall 2002 35
Multi-way Merge
Multi-way Merge
� Use 2k tapes. k input tapes and k output tapes.
� The algorithm will require �����������.
Malek Mouhoub, CS340 Fall 2002 36
7.5 Order Statistics
7.5 Order Statistics
� The ith order statistic of a set of n elements is the ith smallest
element.
– The minimum of a set of elements is the first order statistic.
– The maximum is the nth order statistic.
– the median is the element in the middle of a sorted list of
elements.
� The selection problem consists in selecting the ith order
statistic from a set of n distinct numbers.
Malek Mouhoub, CS340 Fall 2002 37
The
sele
ctio
nP
robl
em
�A
lgor
ithm
1A:
read
the
elem
ents
into
anar
ray
and
sort
them
,
retu
rnin
gth
eap
prop
riate
elem
ent.
�as
sum
ing
asi
mpl
eso
rtin
gal
gorit
hm,t
heru
nnin
gtim
eis
���
��
(���
�����
ifm
erge
sort
ofhe
apso
rtar
e
used
).
�A
lgor
ithm
1B:fi
ndth
ekt
hla
rges
tele
men
t
1.re
ad�
elem
ents
into
anar
ray
and
sort
them
.T
hesm
alle
stof
thes
eis
inth
ekt
hpo
sitio
n.
2.P
roce
ssth
ere
mai
ning
elem
ents
one
byon
e.A
san
elem
ent
arriv
es,i
tis
com
pare
dw
ithth
ekt
hel
emen
tin
the
arra
y.If
itis
larg
er,t
hen
the
kth
elem
enti
sre
mov
ed,a
ndth
ene
wel
emen
t
ispl
aced
inth
eco
rrec
tpla
ceam
ong
the
rem
aini
ng��
�
elem
ents
.
����
� ��
runn
ing
time.
Why
?
�If����
then
both
algo
rithm
sar
e���
��.�
iskn
own
as
the
med
ian
inth
isca
se.
�T
hefo
llow
ing
algo
rithm
sru
nin���
�����
inth
eex
trem
e
case
of����
.
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0238
Alg
orith
m6A
�A
lgor
ithm
for
findi
ngth
ekt
hsm
alle
stel
emen
t
1.R
ead�
elem
ents
into
anar
ray.
2.A
pply
thebuildHeap
algo
rithm
toth
isar
ray.
3.P
erfo
rm�deleteMin
oper
atio
ns.
The
last
elem
ente
xtra
cted
from
the
heap
isth
ean
swer
.
�C
ompl
exity
:���
������
inth
ew
orst
case
.
–�����
������
���
�
–������
����
����
–F
orla
rge
valu
esof�
:���
����
–����
���
����
(Ide
aof
the
heap
sort
).
�B
ych
angi
ngth
ehe
ap-o
rder
prop
erty
,we
will
solv
e
the
prob
lem
offin
ding
the
kth
larg
este
lem
ent.
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0239
Alg
orith
m6B
�F
ind
the
kth
larg
este
lem
ent
1.S
ame
idea
asal
gorit
hm1B
.
2.A
tany
poin
tin
time,
mai
ntai
na
set�
ofth
e�
larg
este
lem
ents
.
3.A
fter
the
first�
elem
ents
are
read
,whe
na
new
elem
enti
sre
adit
isco
mpa
red
with
the
kth
larg
este
lem
ent,
whi
chw
ede
note
by��
(��
is
the
smal
lest
elem
enti
n�
).
–If
the
new
elem
enti
sla
rger
,the
nit
repl
aces
��
in�
.
4.A
tthe
end
ofth
ein
put,
we
find
the
smal
lest
elem
enti
n�
and
retu
rnit
asth
ean
swer
.
����
���
�����������
����
inth
e
wor
stca
se.
Why
?
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0240
Using quick sort for Selection
Using quick sort for Selection
� ����������� �� � ��1 ������ ������� ������ �� �
2 � � ������������� �� � ������
3 If �� � �� then
5 Else If (k>q) � ����������� � � �� � ��
6 Else return �
����� in the worst case but ���� in the average case.
Malek Mouhoub, CS340 Fall 2002 41
tem
plat
e�
clas
sC
omp
int
quic
kSel
ect(
vect
or�
Com
p &
a,in
tle
ft,in
trig
ht,
int
k)
�
/*1*
/if
(left
+10�
=rig
ht)
�
/*2*
/C
omp
pivo
t=
med
ian3
(a,
left,
right
);
//B
egin
part
ition
ing
/*3*
/in
ti=
left,
j=rig
ht�
1;
/*4*
/fo
r(;
;)
�10
/*5*
/w
hile
(a[+
+i]�
pivo
t)��
/*6*
/w
hile
(piv
ot�
a[��
j])��
/*7*
/if
(i�j)
/*8
*/sw
ap(a
[i],
a[j])
;
else /*
9*/
brea
k;
�
/*10
*/sw
ap(a
[i],
a[rig
ht�
1]);
//R
esto
repi
vot
/*11
*/if
(k�
=i)
20/*
12*/
quic
kSel
ect(
a,le
ft,i�
1,k)
;
/*13
*/el
seif
(k
i+
1)
/*14
*/qu
ickS
elec
t(a,
i+1,
right
,k)
;
/*15
*/el
sere
turn
a[k]
�
else
//D
oan
inse
rtio
nso
rton
the
suba
rray
/*16
*/in
sert
ionS
ort(
a,le
ft,rig
ht);
�
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0242
Selection in expected linear time
Selection in expected linear time
������� ���������� �� � ��
1 if p=r
2 then return �3 � � ������� ������������ �� �
4 � � � � �� �
5 If �� � ��
6 then return ������� ���������� �� �� ��
7 else return ������� ���������� � � �� � � � ��
Malek Mouhoub, CS340 Fall 2002 43
Selection in average-case linear time
Selection in average-case linear time
������ ��������� produces a partition whose low side has 1 element with
probability ��� and � elements with probability ��� for � �� � � � � � �� �.
� ��� � ����� ������� � ��� �����
���� ������� � ���� � ����
� ����� �� � �� � �
�����������
� ���� �����
���
�����������
� ��� � ����
The recurrence can be solved by substitution (assuming that � ��� � �� for some constant
�) : � ��� � �� � � ��� ����
Malek Mouhoub, CS340 Fall 2002 44
Sel
ectio
nin
wor
st-c
ase
linea
rtim
e
Idea
ofth
eS
elec
talg
orith
m:
Gua
rant
eea
good
split
whe
n
the
arra
yis
part
ition
ed.
1.D
ivid
eth
e�
elem
ents
ofth
ein
puta
rray
into�����
grou
psof
5el
emen
tsea
chan
dat
mos
tone
grou
pm
ade
upof
the
rem
aini
ng�
mod
5el
emen
ts.
2.F
ind
the
med
ian
ofea
chof
the�����
grou
psby
inse
rtio
n
sort
ing
the
elem
ents
ofea
chgr
oup
and
taki
ngits
mid
dle
elem
ent.
3.U
seS
elec
trec
ursi
vely
tofin
dth
em
edia
n
ofth
e�����
med
ians
foun
din
step
2.
4.P
artit
ion
the
inpu
tarr
ayar
ound
the
med
ian-
of-m
edia
ns
usin
ga
mod
ified
vers
ion
ofth
eP
artit
ion
proc
edur
e.Le
t
beth
enu
mbe
rof
elem
ents
onth
elo
wsi
deof
the
part
ition
,
soth
at��
isth
enu
mbe
rof
elem
ents
onth
ehi
ghsi
de.
5.U
seS
elec
trec
ursi
vely
tofin
dth
eith
smal
lest
elem
ento
n
the
low
side
if��
,or
the����t
hsm
alle
stel
emen
t
onth
ehi
ghsi
deif��
.
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0245
Analysis of the Select algorithm
Analysis of the Select algorithm
The number of elements greater than ! is at least :
����������� � �� ����� �
� if � � �� then � ��� � ����
� if � " �� then � ������� � � ������ � �� �����
The recurrence can be solved by substitution (assuming that
� ��� � �� for some constant �) :
� ��� � �� � � ��� � ����
Malek Mouhoub, CS340 Fall 2002 46