number of symbol comparisons in quicksortsteiner/num09/vallee.pdf · it is sufficient to restrict...
TRANSCRIPT
![Page 1: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/1.jpg)
NUMBER OF SYMBOL COMPARISONS
IN QUICKSORT
Brigitte Vallee
(CNRS and Universite de Caen, France)
Joint work with Julien Clement, Jim Fill and Philippe Flajolet
![Page 2: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/2.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof.
– What is a tamed source ?
![Page 3: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/3.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof
– What is a tamed source ?
![Page 4: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/4.jpg)
The classical framework for sorting.
The main sorting algorithms or searching algorithms
e.g., QuickSort, BST-Search, InsertionSort,...
deal with n (distinct) keys U1, U2, . . . , Un of the same ordered set Ω.
They perform comparisons and exchanges between keys.
The unit cost is the key–comparison.
The behaviour of the algorithm (wrt to key–comparisons)
only depends on the relative order between the keys.
It is sufficient to restrict to the case when Ω = [1..n].The input set is then Sn, with uniform probability.
Then, the analysis of all these algorithms is very well known,
with respect to the number of key–comparisons performed
in the worst-case, or in the average case.
![Page 5: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/5.jpg)
The classical framework for sorting.
The main sorting algorithms or searching algorithms
e.g., QuickSort, BST-Search, InsertionSort,...
deal with n (distinct) keys U1, U2, . . . , Un of the same ordered set Ω.
They perform comparisons and exchanges between keys.
The unit cost is the key–comparison.
The behaviour of the algorithm (wrt to key–comparisons)
only depends on the relative order between the keys.
It is sufficient to restrict to the case when Ω = [1..n].The input set is then Sn, with uniform probability.
Then, the analysis of all these algorithms is very well known,
with respect to the number of key–comparisons performed
in the worst-case, or in the average case.
![Page 6: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/6.jpg)
The classical framework for sorting.
The main sorting algorithms or searching algorithms
e.g., QuickSort, BST-Search, InsertionSort,...
deal with n (distinct) keys U1, U2, . . . , Un of the same ordered set Ω.
They perform comparisons and exchanges between keys.
The unit cost is the key–comparison.
The behaviour of the algorithm (wrt to key–comparisons)
only depends on the relative order between the keys.
It is sufficient to restrict to the case when Ω = [1..n].The input set is then Sn, with uniform probability.
Then, the analysis of all these algorithms is very well known,
with respect to the number of key–comparisons performed
in the worst-case, or in the average case.
![Page 7: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/7.jpg)
Here, realistic analysis of the QuickSort algorithm
QuickSort (n, A): sorts the array A
Choose a pivot;
(k,A−, A+) := Partition(A);QuickSort (k − 1, A−);QuickSort (n− k,A+).
Mean number Kn of key–comparisons
Kn ∼ 2n log n
![Page 8: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/8.jpg)
Here, realistic analysis of the QuickSort algorithm
QuickSort (n, A): sorts the array A
Choose a pivot;
(k,A−, A+) := Partition(A);QuickSort (k − 1, A−);QuickSort (n− k,A+).
Mean number Kn of key–comparisons
Kn ∼ 2n log n
![Page 9: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/9.jpg)
A more realistic framework for sorting.
Keys are viewed as words. The domain Ω of keys is a subset of Σ∞.
Σ∞ = the infinite words on some ordered alphabet Σ.The words are compared [wrt the lexicographic order].
The realistic unit cost is now the symbol–comparison.
The realistic cost of the comparison between two words A and B,
A = a1 a2 a3 . . . ai . . . and B = b1 b2 b3 . . . bi . . .
equals k + 1, where k is the length of their largest common prefix
k := maxi; ∀j ≤ i, aj = bj= the coincidence
![Page 10: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/10.jpg)
A more realistic framework for sorting.
Keys are viewed as words. The domain Ω of keys is a subset of Σ∞.
Σ∞ = the infinite words on some ordered alphabet Σ.The words are compared [wrt the lexicographic order].
The realistic unit cost is now the symbol–comparison.
The realistic cost of the comparison between two words A and B,
A = a1 a2 a3 . . . ai . . . and B = b1 b2 b3 . . . bi . . .
equals k + 1, where k is the length of their largest common prefix
k := maxi; ∀j ≤ i, aj = bj= the coincidence
![Page 11: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/11.jpg)
We are interested in this new cost for each algorithm:
the number of symbol–comparisons...and its mean value Sn( for n words)
How is Sn compared to Kn? That is the question....
An initial question asked by Sedgewick in 2000...
... In order to also compare with text algorithms based on tries.
An example.Sixteen words drawn from the memoryless source p(a) = 1/3, p(b) = 2/3.
We keep the prefixes of length 12.
A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba
D = bbbababbbaab E = bbabbaababbb F = abbbbbbbbabb
G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb
J = abaabbbbaabb K = bbbabbbbbbaa L = aaaabbabaaba
M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb
P = bbabbbaaaabb
![Page 12: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/12.jpg)
We are interested in this new cost for each algorithm:
the number of symbol–comparisons...and its mean value Sn( for n words)
How is Sn compared to Kn? That is the question....
An initial question asked by Sedgewick in 2000...
... In order to also compare with text algorithms based on tries.
An example.Sixteen words drawn from the memoryless source p(a) = 1/3, p(b) = 2/3.
We keep the prefixes of length 12.
A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba
D = bbbababbbaab E = bbabbaababbb F = abbbbbbbbabb
G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb
J = abaabbbbaabb K = bbbabbbbbbaa L = aaaabbabaaba
M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb
P = bbabbbaaaabb
![Page 13: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/13.jpg)
We are interested in this new cost for each algorithm:
the number of symbol–comparisons...and its mean value Sn( for n words)
How is Sn compared to Kn? That is the question....
An initial question asked by Sedgewick in 2000...
... In order to also compare with text algorithms based on tries.
An example.Sixteen words drawn from the memoryless source p(a) = 1/3, p(b) = 2/3.
We keep the prefixes of length 12.
A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba
D = bbbababbbaab E = bbabbaababbb F = abbbbbbbbabb
G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb
J = abaabbbbaabb K = bbbabbbbbbaa L = aaaabbabaaba
M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb
P = bbabbbaaaabb
![Page 14: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/14.jpg)
A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba D =bbbababbbaab E = bbabbaababbb
F = abbbbbbbbabb G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb J = abaabbbbaabb
K = bbbabbbbbbaa L = aaaabbabaaba M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb P = bbabbbaaaabb
![Page 15: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/15.jpg)
![Page 16: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/16.jpg)
![Page 17: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/17.jpg)
![Page 18: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/18.jpg)
![Page 19: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/19.jpg)
![Page 20: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/20.jpg)
![Page 21: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/21.jpg)
![Page 22: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/22.jpg)
![Page 23: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/23.jpg)
![Page 24: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/24.jpg)
![Page 25: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/25.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof
– What is a tamed source ?
![Page 26: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/26.jpg)
Case of QuickSort(n)[CFFV 08]
Theorem. For any tamed source, the mean number Sn of symbol com-
parisons used by QuickSort(n) satisfies
Sn ∼1
hSn log2 n.
and involves the entropy hS of the source S, defined as
hS := limk→∞
−1k
∑w∈Σk
pw log pw
,
where pw is the probability that a word begins with prefix w.
Compared to Kn ∼ 2n log n, there is an extra factor equal to 1/(2hS) log n
Compared to Tn ∼ (1/hS) n log n, there is an extra factor of log n.
![Page 27: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/27.jpg)
Case of QuickSort(n)[CFFV 08]
Theorem. For any tamed source, the mean number Sn of symbol com-
parisons used by QuickSort(n) satisfies
Sn ∼1
hSn log2 n.
and involves the entropy hS of the source S, defined as
hS := limk→∞
−1k
∑w∈Σk
pw log pw
,
where pw is the probability that a word begins with prefix w.
Compared to Kn ∼ 2n log n, there is an extra factor equal to 1/(2hS) log n
Compared to Tn ∼ (1/hS) n log n, there is an extra factor of log n.
![Page 28: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/28.jpg)
The (general) model of source
A general source S produces infinite words on an ordered alphabet Σ.
For w ∈ Σ?, pw := probability that a word begins with the prefix w.
Define aw :=∑
w′, |w′| = |w|w′ < w, p
w′ 6= 0
pw′ bw :=∑
w′, |w′| = |w|w′ ≤ w, p
w′ 6= 0
pw′
Then: ∀u ∈ [0, 1], ∀k ≥ 1, ∃w = Mk(u) ∈ Σk such that u ∈ [aw, bw[.
Example p0 = 1/3, p1 = 2/3 ⇒ M1(1/2) = 1,M1(1/4) = 0p00 = 1/12, p01 = 3/12, p10 = 1/2, p11 = 1/6 ⇒ M2(1/2) = 10,M2(1/4) = 01
If pw → 0 for |w| → ∞, the sequences (aw) and (bw) are adjacent
They define an infinite word M(u) := limk→∞Mk(u).Then, the source is alternatively defined by a mapping M : [0, 1] → Σ∞.
Fundamental interval [aw, bw] := u, M(u) begins with prefix w
![Page 29: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/29.jpg)
The (general) model of source
A general source S produces infinite words on an ordered alphabet Σ.
For w ∈ Σ?, pw := probability that a word begins with the prefix w.
Define aw :=∑
w′, |w′| = |w|w′ < w, p
w′ 6= 0
pw′ bw :=∑
w′, |w′| = |w|w′ ≤ w, p
w′ 6= 0
pw′
Then: ∀u ∈ [0, 1], ∀k ≥ 1, ∃w = Mk(u) ∈ Σk such that u ∈ [aw, bw[.
Example p0 = 1/3, p1 = 2/3 ⇒ M1(1/2) = 1,M1(1/4) = 0
p00 = 1/12, p01 = 3/12, p10 = 1/2, p11 = 1/6 ⇒ M2(1/2) = 10,M2(1/4) = 01
If pw → 0 for |w| → ∞, the sequences (aw) and (bw) are adjacent
They define an infinite word M(u) := limk→∞Mk(u).Then, the source is alternatively defined by a mapping M : [0, 1] → Σ∞.
Fundamental interval [aw, bw] := u, M(u) begins with prefix w
![Page 30: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/30.jpg)
The (general) model of source
A general source S produces infinite words on an ordered alphabet Σ.
For w ∈ Σ?, pw := probability that a word begins with the prefix w.
Define aw :=∑
w′, |w′| = |w|w′ < w, p
w′ 6= 0
pw′ bw :=∑
w′, |w′| = |w|w′ ≤ w, p
w′ 6= 0
pw′
Then: ∀u ∈ [0, 1], ∀k ≥ 1, ∃w = Mk(u) ∈ Σk such that u ∈ [aw, bw[.
Example p0 = 1/3, p1 = 2/3 ⇒ M1(1/2) = 1,M1(1/4) = 0p00 = 1/12, p01 = 3/12, p10 = 1/2, p11 = 1/6 ⇒ M2(1/2) = 10,M2(1/4) = 01
If pw → 0 for |w| → ∞, the sequences (aw) and (bw) are adjacent
They define an infinite word M(u) := limk→∞Mk(u).Then, the source is alternatively defined by a mapping M : [0, 1] → Σ∞.
Fundamental interval [aw, bw] := u, M(u) begins with prefix w
![Page 31: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/31.jpg)
The (general) model of source
A general source S produces infinite words on an ordered alphabet Σ.
For w ∈ Σ?, pw := probability that a word begins with the prefix w.
Define aw :=∑
w′, |w′| = |w|w′ < w, p
w′ 6= 0
pw′ bw :=∑
w′, |w′| = |w|w′ ≤ w, p
w′ 6= 0
pw′
Then: ∀u ∈ [0, 1], ∀k ≥ 1, ∃w = Mk(u) ∈ Σk such that u ∈ [aw, bw[.
Example p0 = 1/3, p1 = 2/3 ⇒ M1(1/2) = 1,M1(1/4) = 0p00 = 1/12, p01 = 3/12, p10 = 1/2, p11 = 1/6 ⇒ M2(1/2) = 10,M2(1/4) = 01
If pw → 0 for |w| → ∞, the sequences (aw) and (bw) are adjacent
They define an infinite word M(u) := limk→∞Mk(u).Then, the source is alternatively defined by a mapping M : [0, 1] → Σ∞.
Fundamental interval [aw, bw] := u, M(u) begins with prefix w
![Page 32: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/32.jpg)
Natural instances of sources: Dynamical sources
With a shift map T : I → I and an encoding map τ : I → Σ,
the emitted word is M(x) = (τx, τTx, τT 2x, . . . τT kx, . . .)
xT xT x2 T x3
A dynamical system, with Σ = a, b, c and a word M(x) = (c, b, a, c . . .).
![Page 33: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/33.jpg)
Natural instances of sources: Dynamical sources
With a shift map T : I → I and an encoding map τ : I → Σ,
the emitted word is M(x) = (τx, τTx, τT 2x, . . . τT kx, . . .)
xT xT x2 T x3
A dynamical system, with Σ = a, b, c and a word M(x) = (c, b, a, c . . .).
![Page 34: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/34.jpg)
Memoryless sources or Markov chains.= Dynamical sources with affine branches....
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 35: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/35.jpg)
The dynamical framework leads to more general sources.
The curvature of branches entails correlation between symbols
Example : the Continued Fraction source
![Page 36: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/36.jpg)
The dynamical framework leads to more general sources.
The curvature of branches entails correlation between symbols
Example : the Continued Fraction source
![Page 37: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/37.jpg)
Fundamental intervals and fundamental triangles.
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 38: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/38.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof
– What is a tamed source ?
![Page 39: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/39.jpg)
Three main steps for the analysis
of the mean number Sn of symbol comparisons
(A) The Poisson model PZ does not deal with a fixed number n of keys.
The number N of keys is now a random variable which follows a Poisson
law of parameter Z.
We first obtain nice expressions for SZ ....
(B) It is now possible to returm to the model where the number of keys
is fixed. We obtain a nice exact formula for Sn ....
from which it is not easy to obtain the asymptotics...
(C) Then, the Rice formula provides the asymptotics of Sn ( n →∞), as
soon as the source is “tamed”.
![Page 40: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/40.jpg)
Three main steps for the analysis
of the mean number Sn of symbol comparisons
(A) The Poisson model PZ does not deal with a fixed number n of keys.
The number N of keys is now a random variable which follows a Poisson
law of parameter Z.
We first obtain nice expressions for SZ ....
(B) It is now possible to returm to the model where the number of keys
is fixed. We obtain a nice exact formula for Sn ....
from which it is not easy to obtain the asymptotics...
(C) Then, the Rice formula provides the asymptotics of Sn ( n →∞), as
soon as the source is “tamed”.
![Page 41: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/41.jpg)
Three main steps for the analysis
of the mean number Sn of symbol comparisons
(A) The Poisson model PZ does not deal with a fixed number n of keys.
The number N of keys is now a random variable which follows a Poisson
law of parameter Z.
We first obtain nice expressions for SZ ....
(B) It is now possible to returm to the model where the number of keys
is fixed. We obtain a nice exact formula for Sn ....
from which it is not easy to obtain the asymptotics...
(C) Then, the Rice formula provides the asymptotics of Sn ( n →∞), as
soon as the source is “tamed”.
![Page 42: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/42.jpg)
Three main steps for the analysis
of the mean number Sn of symbol comparisons
(A) The Poisson model PZ does not deal with a fixed number n of keys.
The number N of keys is now a random variable which follows a Poisson
law of parameter Z.
We first obtain nice expressions for SZ ....
(B) It is now possible to returm to the model where the number of keys
is fixed. We obtain a nice exact formula for Sn ....
from which it is not easy to obtain the asymptotics...
(C) Then, the Rice formula provides the asymptotics of Sn ( n →∞), as
soon as the source is “tamed”.
![Page 43: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/43.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof
– What is a tamed source ?
![Page 44: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/44.jpg)
(A) Dealing with the Poisson Model.
In the PZ model, the number N of keys follows the Poisson law
Pr[N = n] = e−Z Zn
n!,
the mean number S(Z) of symbol comparisons for QuickSort is
S(Z) =∫T
[γ(u, t) + 1]π(u, t) du dt
where T := (u, t), 0 ≤ u ≤ t ≤ 1 is the unit triangle
γ(u, t):= coincidence between M(u) and M(t)π(u, t) du dt := Mean number of key-comparisons between M(u′)
and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
![Page 45: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/45.jpg)
First Step in the Poisson model : The coincidence γ(u, t)An (easy) alternative expression for
S(Z) =∫T
[γ(u, t) + 1]π(u, t) du dt
=∑
w∈Σ?
∫Tw
π(u, t) du dt
It involves the fundamental triangles
and separates the roles of the source and the algorithm.
It is then sufficient to
– study the key–probability π(u, t) of the algorithm (the second step).
– take its integral on each fundamental triangle (the third step)
![Page 46: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/46.jpg)
Fundamental intervals and fundamental triangles.
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 47: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/47.jpg)
Study of the key probability π(u, t) of the algorithm (I)
π(u, t) du dt := Mean number of key-comparisons between M(u′)and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
Case of QuickSort. M(u) and M(t) are compared
iff the first pivot chosen in M(v), v ∈ [u, t] is M(u) or M(t)
QuickSort (n, A): sorts the array A
Choose a pivot;
(k,A−, A+) := Partition(A);QuickSort (k − 1, A−);QuickSort (n− k,A+).
![Page 48: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/48.jpg)
Study of the key probability π(u, t) of the algorithm (I)
π(u, t) du dt := Mean number of key-comparisons between M(u′)and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
Case of QuickSort. M(u) and M(t) are compared
iff the first pivot chosen in M(v), v ∈ [u, t] is M(u) or M(t)
QuickSort (n, A): sorts the array A
Choose a pivot;
(k,A−, A+) := Partition(A);QuickSort (k − 1, A−);QuickSort (n− k,A+).
![Page 49: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/49.jpg)
Study of the key probability π(u, t) of the algorithm (I)
π(u, t) du dt := Mean number of key-comparisons between M(u′)and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
Case of QuickSort. M(u) and M(t) are compared
iff the first pivot chosen in M(v), v ∈ [u, t] is M(u) or M(t)
QuickSort (n, A): sorts the array A
Choose a pivot;
(k,A−, A+) := Partition(A);QuickSort (k − 1, A−);QuickSort (n− k,A+).
![Page 50: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/50.jpg)
Study of the key probability π(u, t) of the algorithm (I)
π(u, t) du dt := Mean number of key-comparisons between M(u′)and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
Case of QuickSort. M(u) and M(t) are compared
iff the first pivot chosen in M(v), v ∈ [u, t] is M(u) or M(t)
π(u, t) du dt = Zdu · Zdt · E[
22 + N[u,t]
]Here, N[u,t] is the number of words M(v) with v ∈ [u, t],
It follows a Poisson law of parameter Z(t− u).
Then: π(u, t) = 2 Z2 f1(Z(t− u)) with f1(θ) := θ−2 [e−θ−1+θ]
Finally:
S(Z) = 2 Z2∑
w∈Σ?
∫Tw
f1(Z(t− u)) du dt
![Page 51: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/51.jpg)
Study of the key probability π(u, t) of the algorithm (I)
π(u, t) du dt := Mean number of key-comparisons between M(u′)and M(t′) with u′ ∈ [u, u + du] and t′ ∈ [t− dt, t].
Case of QuickSort. M(u) and M(t) are compared
iff the first pivot chosen in M(v), v ∈ [u, t] is M(u) or M(t)
π(u, t) du dt = Zdu · Zdt · E[
22 + N[u,t]
]Here, N[u,t] is the number of words M(v) with v ∈ [u, t],
It follows a Poisson law of parameter Z(t− u).
Then: π(u, t) = 2 Z2 f1(Z(t− u)) with f1(θ) := θ−2 [e−θ−1+θ]
Finally:
S(Z) = 2 Z2∑
w∈Σ?
∫Tw
f1(Z(t− u)) du dt
![Page 52: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/52.jpg)
(B) Return to the model where n is fixed.With the expansion of f1, the mean value
S(Z) =∞∑
k=2
(−1)k$(−k)Zk
k!,
is expressed with a series $(s) of Dirichlet type,
which depends both on the algorithm and the source.
$(s) = 2∑
w∈Σ?
∫Tw
(t− u)−(s+2)dudt
∫Tw
(t− u)−(s+2)dudt =p−s
w
s(s + 1)=⇒ $(s) = 2
Λ(s)s(s + 1)
where Λ(s) :=∑
w∈Σ?
p−sw is the Dirichlet series of probabilities.
SinceSn
n!= [Zn]
(eZ · S(Z)
), there is an exact formula for Sn
Sn = 2n∑
k=2
(−1)k
(n
k
)$(−k) = 2
n∑k=2
(−1)k
(n
k
)Λ(−k)
k(k − 1).
![Page 53: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/53.jpg)
(B) Return to the model where n is fixed.With the expansion of f1, the mean value
S(Z) =∞∑
k=2
(−1)k$(−k)Zk
k!,
is expressed with a series $(s) of Dirichlet type,
which depends both on the algorithm and the source.
$(s) = 2∑
w∈Σ?
∫Tw
(t− u)−(s+2)dudt
∫Tw
(t− u)−(s+2)dudt =p−s
w
s(s + 1)=⇒ $(s) = 2
Λ(s)s(s + 1)
where Λ(s) :=∑
w∈Σ?
p−sw is the Dirichlet series of probabilities.
SinceSn
n!= [Zn]
(eZ · S(Z)
), there is an exact formula for Sn
Sn = 2n∑
k=2
(−1)k
(n
k
)$(−k) = 2
n∑k=2
(−1)k
(n
k
)Λ(−k)
k(k − 1).
![Page 54: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/54.jpg)
(C) Using Rice formula
As soon as $(s) is “weakly tamed” in <(s) < σ0 with σ0 > −2,
the residue formula transforms the sum into an integral:
Sn =n∑
k=2
(−1)k
(n
k
)$(−k) =
12iπ
∫ d+i∞
d−i∞$(s)
n!s(s + 1) . . . (s + n)
ds,
with −2 < d < min(−1, σ0).
![Page 55: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/55.jpg)
(C) Using Rice formula
As soon as $(s) is “weakly tamed” in <(s) < σ0 with σ0 > −2,
the residue formula transforms the sum into an integral:
Sn =n∑
k=2
(−1)k
(n
k
)$(−k) =
12iπ
∫ d+i∞
d−i∞$(s)
n!s(s + 1) . . . (s + n)
ds,
with −2 < d < min(−1, σ0).
Where are the leftmost singularities for $(s) ?
Recall: $(s) = 2Λ(s)
s(s + 1)
where Λ(s) :=∑
w∈Σ?
p−sw has always a singularity at s = −1.
What type of singularity? Is it the dominant singularity?
![Page 56: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/56.jpg)
Plan of the talk.
– Presentation of the study
– Statement of the results
– The general model of source
– The main steps of the method
– Sketch of the proof
– What is a tamed source?
![Page 57: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/57.jpg)
What can be expected about Λ(s)?
— For any source, Λ(s) has a singularity at s = −1.
— For a tamed source S, the dominant singularity of Λ(s) is located
at s = −1, this is a simple pole, whose residue equals 1/hS .
— In this case, there is a triple pole at s = −1 for$(s)s + 1
= 2Λ(s)
s(s + 1)2
and$(s)s + 1
∼ 2hS
1(s + 1)3
s → −1
![Page 58: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/58.jpg)
What can be expected about Λ(s)?
— For any source, Λ(s) has a singularity at s = −1.
— For a tamed source S, the dominant singularity of Λ(s) is located
at s = −1, this is a simple pole, whose residue equals 1/hS .
— In this case, there is a triple pole at s = −1 for$(s)s + 1
= 2Λ(s)
s(s + 1)2
and$(s)s + 1
∼ 2hS
1(s + 1)3
s → −1
![Page 59: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/59.jpg)
What can be expected about Λ(s)?
— For any source, Λ(s) has a singularity at s = −1.
— For a tamed source S, the dominant singularity of Λ(s) is located
at s = −1, this is a simple pole, whose residue equals 1/hS .
— In this case, there is a triple pole at s = −1 for$(s)s + 1
= 2Λ(s)
s(s + 1)2
and$(s)s + 1
∼ 2hS
1(s + 1)3
s → −1
![Page 60: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/60.jpg)
For shifting the integral to the right, past... d = −1,
other properties of Λ(s) are needed on <s ≥ −1, –more subtle–
Different behaviours of Λ(s) for <s ≥ −1 where one can past d = −1...
In colored domains, Λ(s) is meromorphic and of polynomial growth for |s| → ∞.
For dynamical sources, we provide sufficient conditions
(of geometric or arithmetic type), under which these behaviours hold.
For a memoryless source, they depend on the approximability of ratios log pi/ log pj
![Page 61: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/61.jpg)
For shifting the integral to the right, past... d = −1,
other properties of Λ(s) are needed on <s ≥ −1, –more subtle–
Different behaviours of Λ(s) for <s ≥ −1 where one can past d = −1...
In colored domains, Λ(s) is meromorphic and of polynomial growth for |s| → ∞.
For dynamical sources, we provide sufficient conditions
(of geometric or arithmetic type), under which these behaviours hold.
For a memoryless source, they depend on the approximability of ratios log pi/ log pj
![Page 62: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/62.jpg)
For shifting the integral to the right, past... d = −1,
other properties of Λ(s) are needed on <s ≥ −1, –more subtle–
Different behaviours of Λ(s) for <s ≥ −1 where one can past d = −1...
In colored domains, Λ(s) is meromorphic and of polynomial growth for |s| → ∞.
For dynamical sources, we provide sufficient conditions
(of geometric or arithmetic type), under which these behaviours hold.
For a memoryless source, they depend on the approximability of ratios log pi/ log pj
![Page 63: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/63.jpg)
Conclusions.
If the source S is tamed, then Sn ∼1
hSn log2 n. We are done !
— QuickSort must be compared to the algorithm which uses tries for
sorting n words. We have studied its mean cost Tn in 2001.
For a tamed source, Tn ∼1
hSn log n.
— Our methods apply to all the QuickSelect algorithms, and the hy-
potheses needed for the source are different (less strong).
They are related to properties of Λ(s) for <s < −1.
— It is easy to adapt our results to the intermittent sources, which emits
“long” sequences of the same symbols. In this case,
Sn = Θ(n log3 n), Tn = Θ(n log2 n).
![Page 64: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/64.jpg)
Conclusions.
If the source S is tamed, then Sn ∼1
hSn log2 n. We are done !
— QuickSort must be compared to the algorithm which uses tries for
sorting n words. We have studied its mean cost Tn in 2001.
For a tamed source, Tn ∼1
hSn log n.
— Our methods apply to all the QuickSelect algorithms, and the hy-
potheses needed for the source are different (less strong).
They are related to properties of Λ(s) for <s < −1.
— It is easy to adapt our results to the intermittent sources, which emits
“long” sequences of the same symbols. In this case,
Sn = Θ(n log3 n), Tn = Θ(n log2 n).
![Page 65: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/65.jpg)
Conclusions.
If the source S is tamed, then Sn ∼1
hSn log2 n. We are done !
— QuickSort must be compared to the algorithm which uses tries for
sorting n words. We have studied its mean cost Tn in 2001.
For a tamed source, Tn ∼1
hSn log n.
— Our methods apply to all the QuickSelect algorithms, and the hy-
potheses needed for the source are different (less strong).
They are related to properties of Λ(s) for <s < −1.
— It is easy to adapt our results to the intermittent sources, which emits
“long” sequences of the same symbols. In this case,
Sn = Θ(n log3 n), Tn = Θ(n log2 n).
![Page 66: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/66.jpg)
Conclusions.
If the source S is tamed, then Sn ∼1
hSn log2 n. We are done !
— QuickSort must be compared to the algorithm which uses tries for
sorting n words. We have studied its mean cost Tn in 2001.
For a tamed source, Tn ∼1
hSn log n.
— Our methods apply to all the QuickSelect algorithms, and the hy-
potheses needed for the source are different (less strong).
They are related to properties of Λ(s) for <s < −1.
— It is easy to adapt our results to the intermittent sources, which emits
“long” sequences of the same symbols. In this case,
Sn = Θ(n log3 n), Tn = Θ(n log2 n).
![Page 67: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/67.jpg)
Open problems.
— What about the distribution of the average search cost in a BST?
Is it asymptotically normal?
We know that is the case if one counts the number of key–comparisons.
We already know that, for a tamed source,
the average depth of a trie is asymptotically normal (Cesaratto-V, ’07).
Long term research projects...
— Revisit the complexity results of the main classical algorithms,
and take into account the number of symbol-comparisons...
instead of the number of key-comparisons.
— Provide a sharp “analytic” classification of sources:
Transfer geometric properties of sources into analytical properties of Λ(s)
![Page 68: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/68.jpg)
Open problems.
— What about the distribution of the average search cost in a BST?
Is it asymptotically normal?
We know that is the case if one counts the number of key–comparisons.
We already know that, for a tamed source,
the average depth of a trie is asymptotically normal (Cesaratto-V, ’07).
Long term research projects...
— Revisit the complexity results of the main classical algorithms,
and take into account the number of symbol-comparisons...
instead of the number of key-comparisons.
— Provide a sharp “analytic” classification of sources:
Transfer geometric properties of sources into analytical properties of Λ(s)
![Page 69: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/69.jpg)
Open problems.
— What about the distribution of the average search cost in a BST?
Is it asymptotically normal?
We know that is the case if one counts the number of key–comparisons.
We already know that, for a tamed source,
the average depth of a trie is asymptotically normal (Cesaratto-V, ’07).
Long term research projects...
— Revisit the complexity results of the main classical algorithms,
and take into account the number of symbol-comparisons...
instead of the number of key-comparisons.
— Provide a sharp “analytic” classification of sources:
Transfer geometric properties of sources into analytical properties of Λ(s)
![Page 70: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/70.jpg)
Natural instances of sources: Dynamical sources
With a shift map T : I → I and an encoding map τ : I → Σ,
the emitted word is M(x) = (τx, τTx, τT 2x, . . . τT kx, . . .)
xT xT x2 T x3
A dynamical system, with Σ = a, b, c and a word M(x) = (c, b, a, c . . .).
![Page 71: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/71.jpg)
Natural instances of sources: Dynamical sources
With a shift map T : I → I and an encoding map τ : I → Σ,
the emitted word is M(x) = (τx, τTx, τT 2x, . . . τT kx, . . .)
xT xT x2 T x3
A dynamical system, with Σ = a, b, c and a word M(x) = (c, b, a, c . . .).
![Page 72: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/72.jpg)
Memoryless sources or Markov chains.= Dynamical sources with affine branches....
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 73: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/73.jpg)
The dynamical framework leads to more general sources.
The curvature of branches entails correlation between symbols
Example : the Continued Fraction source
![Page 74: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/74.jpg)
The dynamical framework leads to more general sources.
The curvature of branches entails correlation between symbols
Example : the Continued Fraction source
![Page 75: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/75.jpg)
Fundamental intervals and fundamental triangles.
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
x0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 76: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/76.jpg)
Case of QuickMin(n), QuickMax(n), [CFFV 08]
Theorem 2. For any weakly tamed source, the mean numbers
of symbol comparisons used by QuickMin(n) and QuickMax(n)
T (−)n ∼ c
(−)S n and T (+)
n ∼ c(+)S n,
involve the constants c(ε)S which depend on probabilities pw and p
(ε)w , (ε = ±)
c(ε)S :=
Xw∈Σ?
pw
"1− p
(ε)w
pwlog
1 +
pw
p(ε)w
!#.
Here p(−)w , p
(+)w , pw are the probabilities that a word begins with a prefix w′,
with |w′| = |w| and w′ < w or w′ > w or w′ = w.
![Page 77: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/77.jpg)
Case of QuickRand(n) [CFFV 08]
Theorem 3. For any weakly tamed source, the mean number of symbol
comparisons used by QuickRand(n) (randomized wrt rank), satisfies
Tn ∼ cS n, with
cS =X
w∈Σ?
p2w
"2 +
1
pw+Xε=±
"log
1 +
p(ε)w
pw
!−
p(ε)w
pw
!2
log
1 +
pw
p(ε)w
!##,
Here p(−)w , p
(−)w , pw are the probabilities that a word begins with a prefix w′,
with |w′| = |w| and w′ < w or w′ > w or w′ = w.
![Page 78: NUMBER OF SYMBOL COMPARISONS IN QUICKSORTsteiner/num09/vallee.pdf · It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability](https://reader035.vdocuments.mx/reader035/viewer/2022071218/6050c4f6c8f23e16a07f8749/html5/thumbnails/78.jpg)
Case of QuickQuantα(n) [CFFV 09] Work yet in progress
Theorem 4. For any weakly tamed source, the mean number of symbol
comparisons used by QuickQuantα(n) satisfies q(α)n ∼ ρS(α) n with
ρS(α) = 2X
w∈S(α)
pw + pw log pw − p(α,+)w log p(α,+)
w − p(α,−)w log p(α,−)
w
+ 2X
w∈R(α)
pw
"1 +
p(α,−)w
pw− 1
!log
1− pw
p(α,−)w
!#
+ 2X
w∈L(α)
pw
"1 +
p(α,+)w
pw− 1
!log
1− pw
p(α,+)w
!#.
Three parts depending on the position of probabilities p(ε)w wrt α:
w ∈ L(α) iff p(+)w ≥ 1− α, w ∈ R(α) iff p
(−)w ≥ α
w ∈ S(α) iff p(−)w < α < 1− p
(+)w ,
p(α,−)w = 1− α− p
(+)w , p
(α,−)w = α− p
(−)w