kolmogorov-smirnov and the others - unigeobseyer/cafstat/paltani_060315.pdf · correct usage of...
TRANSCRIPT
![Page 1: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/1.jpg)
1/23
How to compare distributions?
or rather
Has my sample been drawn from this distribution?
or even
Kolmogorov-Smirnov and the others...
Stéphane Paltani
ISDC/Observatoire de Genève
![Page 2: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/2.jpg)
2/23
The Problem... (I)
Sample of observed values (random variable): {Xi}, i=1,...,N
Expected distribution: f(x), Xmin ≤ x ≤ Xmax
→ Is my {Xi}, i=1,...,N sample a probable outcome from a draw of N values
from a distribution f(X), Xmin ≤ X≤ Xmax ?
![Page 3: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/3.jpg)
3/23
Example... (I)
An X-ray detector collects photons at specific times: {Xi}≡{ti}, i=1,...,N
Is the source constant ?
Expected distribution: tmin ≤ t ≤ tmax f t = 1 tmax−tmin
,
![Page 4: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/4.jpg)
4/23
The Problem... (II)
Sample of observed values (random variable): {Xi}, i=1,...,N
Sample of observed values (random variable): {Yj}, j=1,...,M
→ Are the {Xi}, i=1,...,N distributed the same way as the {Yj}, j=1,...,M ?
![Page 5: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/5.jpg)
5/23
Example... (II)
We measure the amplitude of variability of different types of active galactic nuclei.
Seyfert 1 galaxies: {σiS}, i=1,...,NQSOs: {σjQ}, j=1,...,MBL Lacs: {σkB}, k=1,...,PDo these different types of objects have different variability properties ?
![Page 6: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/6.jpg)
6/23
Distribution Cumulatives
For continuous distributions, we can easily compare visually how well two distributions agree by building the cumulatives of these distributions. The cumulative of a distribution is simply its integral:
F x=∫X min
xf ydy
C {X i } x=∑i
H x−X i
Different approaches can then be followed to obtain a quantitative assessment
The cumulative of a sample can be calculated in a similar way:
(Heaviside function)
![Page 7: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/7.jpg)
7/23
Statistique de Kolmogorov-Smirnov
The simplest one: We determine the maximum separation between the two cumulatives :
D = maxX min≤ x≤X max
∣C X i−F x−X min∣
![Page 8: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/8.jpg)
8/23
Virtues of Kolmogorov-Smirnov (I)
It is possible to estimate the expected distribution of D in the case of a uniform distribution:
P D=2∑j=1
∞
−1 j−1 e−2j22
, =N0.120.11
N D
P(λ>D) is a one-sided probability distribution, so the smaller D, the larger the probability
If one compares two samples, their variance must be added, and:
N e=N⋅M
NM
![Page 9: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/9.jpg)
9/23
Virtues of Kolmogorov-Smirnov (II)
D is preserved under any transformation x → y = ψ(x), where ψ(x) is an arbitrary strictly monotonic function
→ P(λ>D) does not depend on the shape of the underlying distributions !
![Page 10: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/10.jpg)
10/23
Correct usage of Kolmogorov-Smirnov
● Decide for a threshold α, for instance 5%
● If P(λ>D)>α, then one cannot exclude that the sample has been drawn from the given distribution
● One can never be sure that the sample has been actually drawn from the given distribution
DO NOT SAY:
The probability that the sample has been drawn from this distribution is X%
SAY:
If P(λ>D)>α:We cannot exclude that the sample has been drawn from this distribution
If P(λ>D)<α:The probability of such a high KS value to be obtained is only X%
![Page 11: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/11.jpg)
11/23
Caveats (I)
No correlation in the sample is allowed: Sample must be in the Poisson regime
There is a general problem among non-frequentists (the Bayesians) with the notion of null hypothesis: Such frequentist tests implicitly specify classes of alternative hypotheses and exclude others...
Furthermore, one may reject the hypothesis if the model parameters are slightly wrong
![Page 12: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/12.jpg)
12/23
Caveats (II)
In cases where the model has parameters, if one estimates them using the same data,one cannot use KS statistics anymore!
→ Use Monte Carlo simulations
→ For Gaussian distribution, use Shapiro-Wilk normality test
![Page 13: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/13.jpg)
13/23
Caveats (III)
Sensitivity of the Kolmogorov-Smirnov tes is not constant between Xmin and Xmaxsince the variance of C{Xi} is proportional to F(x-Xmin).(1-F(x-Xmin)), which
reaches a maximum for F(x-Xmin) = 0.5
80 photons/1000 addedin the tails
80 photons/1000 addedin the center
![Page 14: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/14.jpg)
14/23
Non-uniformity Correction
Anderson-Darling's statistics:
D∗= maxX min≤x≤X max
∣C X i−F x−X min∣
F x−X min1−F x−X min
Or:
D∗∗= ∫X min≤ x≤X max
∣C X i−F x−X min∣
F x−X min1−F x−X mindF
One can think of other ones...
However, none of them provides a simple (or even workable) expression for
P(λ>D*(*))...
![Page 15: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/15.jpg)
15/23
The simplest modification to KS is the best one!
V=DD−= maxX min≤x≤X max
C {X i}−F x−X min max
X min≤x≤X max
F x−X min−C {X i }
And:
P V =2∑j=1
∞
4j22−1e−2j22
, =N0.1550.24
N V
Kuiper's Statistics
![Page 16: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/16.jpg)
16/23
Kuiper on a Circle
If the distribution is defined over a periodic domain, Kuiper's statistics does not depend on the choice of the origin, contrarily to KS.
60 photons/1000 addedin the tails
60 photons/1000 addedin the center
![Page 17: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/17.jpg)
17/23
Standard test is Rayleigh test, which is nothing else than the Fourier power spectrum
for signals expressed in the form:
Application of Kuiper's statistics to the search of periodic signals
Rayleigh test Kuiper test
0 1
S t =∑i t−t i
![Page 18: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/18.jpg)
18/23
0 1
Interrupted Observations
But X-ray observations are often interrupted because of various instrumental problems,
so the expected distribution is not uniform!
Kuiper test can take into account very naturally the non-uniformity of the phase
exposure map
“good time intervals” Phase exposure map Expected “uniform” distribution
![Page 19: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/19.jpg)
19/23
Examples with simulated data
Continuouscase
Real GTIs Real GTIswith signal
![Page 20: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/20.jpg)
20/23
Examples with real data
EX Hya
UW Pic
![Page 21: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/21.jpg)
21/23
Ad nauseam: Kuiper for the fanatical... (I)
α,β :
![Page 22: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/22.jpg)
22/23
Ad nauseam: Kuiper for the fanatical... (II)
Tail probabilities
Expected
Best
Asymptoticequation, instead of analytical expression
![Page 23: Kolmogorov-Smirnov and the others - UNIGEobseyer/CAFSTAT/paltani_060315.pdf · Correct usage of Kolmogorov-Smirnov Decide for a threshold α , for instance 5% If P(λ>D)>α , then](https://reader031.vdocuments.mx/reader031/viewer/2022020105/5ccd753588c99335448de5c8/html5/thumbnails/23.jpg)
23/23
References
● Any text book in statistics...
● Press W. et al., Numerical Recipes in C,F777, ..., Sect. 14
● Kuiper N.H., 1962, Proc. Koninklijke Nederlandse Akad. Wetenshappen A 63, 38
● Stephens M.A., 1965, Biometrika 70, 11
● Paltani S., 2004, A&A 420, 789