![Page 1: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/1.jpg)
Statistical analysis using matlab
HY 439
Presented by: George Fortetsanakis
![Page 2: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/2.jpg)
Roadmap
• Probability distributions
• Statistical estimation
• Fitting data to probability distributions
![Page 3: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/3.jpg)
Continuous distributions
• Continuous random variable X takes values in subset of real numbers D⊆R
• X corresponds to measurement of some property, e.g., length, weight
• Not possible to talk about the probability of X taking a specific value
• Instead talk about probability of X lying in a given interval
0)( xXP
]) ,[()( 2121 xxXPxXxP
]) ,[()( xXPxXP
![Page 4: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/4.jpg)
Probability density function (pdf)
• Continuous function p(x) defined for each x∈D
• Probability of X lying in interval I⊆D computed by integral:
• Examples:
• Important property:
lx
dxxplXP )()(
1)()( DxdxxpDXP
2
1
)(]) x,[()( 2121
x
x
dxxpxXPxXxP
x
dxxpXPxXP )(]) x,[()(
![Page 5: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/5.jpg)
Cumulative distribution function (cdf)
• For each x∈D defines the probability
Important properties:
•
•
•
Complementary cumulative distribution function (ccdf)
)( xXP
x
dxxpXPxXPxF )(]) x,[()()(
0)( F
1)( F
)()()( 1221 xFxFxXxP
)(1)(1)()( xFxXPxXPxG
![Page 6: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/6.jpg)
Exponential distribution
Probability density function
Cumulative distribution function
Memoryless property:
)()|( tTPTtTP
![Page 7: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/7.jpg)
Poisson process
Random process that describes the timestamps of various events
• Telephone call arrivals
• Packet arrivals on a router
Time between two consecutive arrivals follows exponential distribution
Time intervals t1, t2, t3, … are drawn from exponential distribution
t1 t2 t3 t4 t5 t6
…
Arrival 1 Arrival 2 Arrival 3 Arrival 4 Arrival 5 Arrival 6 Arrival 7
![Page 8: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/8.jpg)
Roadmap
• Probability distributions
• Statistical estimation
• Fitting data to probability distributions
![Page 9: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/9.jpg)
Basic statistics
Suppose a set of measurements x = [x1 x2 … xn]
• Estimation of mean value: (matlab m=mean(x);)
• Estimation of standard deviation: (matlab s=std(x);)
n
xn
i
i 1
^
1
1
2^
^
n
xn
i
i
![Page 10: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/10.jpg)
Estimate pdf
• Suppose dataset x = [x1 x2 … xk]
• Can we estimate the pdf that values in x follow?
![Page 11: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/11.jpg)
Estimate pdf
• Suppose dataset x = [x1 x2 … xk]
• Can we estimate the pdf that values in x follow?
Produce histogram
![Page 12: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/12.jpg)
Step 1
• Divide sampling space into a number of bins
• Measure the number of samples in each bin
-4 -2 0 2 4
-4 -2 0 2 4
3 samples 6 samples5 samples 2 samples
![Page 13: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/13.jpg)
Step 2
• E = total area under histogram plot = 2*3 + 2*5 + 2*6 +2*2 = 32
• Normalize y axis by dividing by E
-4 -2 0 2 4
3
65
2
x
Freq
ue
ncy
-4 -2 0 2 4
3/32
6/325/32
2/32
x
P(x
)
![Page 14: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/14.jpg)
Matlab code
function produce_histogram(x, bins)
% input parameters
% X =[x1; x2; … xn]: a column vector containing the data x1, x2, …, xn.
% bins = [b1; b2; …bk]: A vector that Divides the sampling space in bins
% centered around the points b1, b2, …, bk.
figure; % Create a new figure
[f y] = hist(x, bins); % Assign your data points to the corresponding bins
bar(y, f/trapz(y,f), 1); % Plot the histogram
xlabel('x'); % Name axis x
ylabel('p(x)'); % Name axis y
end
![Page 15: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/15.jpg)
Histogram examples1
00
0 s
amp
les
10
00
0 s
amp
les
Bin spacing 0.1 Bin spacing 0.05
![Page 16: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/16.jpg)
Empirical cdf
How can we estimate the cdf that values in x follow?
Use matlab function ecdf(x)
Empirical cdf estimated with 300 samples from normal distribution
![Page 17: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/17.jpg)
Percentiles• Values of variable below which a certain percentage of observations fall
• 80th percentile is the value, below which 80 % of observations fall.
80th percentile
![Page 18: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/18.jpg)
Estimate percentiles
Percentiles in matlab: p = prctile(x, y);
• y takes values in interval [0 100]
• 80th percentile: p = prctile(x, 80);
Median: the 50th percentile
• med = prctile(x, 50); or
• med = median(x);
Why is median different than the mean?
• Suppose dataset x = [1 100 100]: mean = 201/3=67, median = 100
![Page 19: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/19.jpg)
Roadmap
• Elements of probability theory
• Probability distributions
• Statistical estimation
• Fitting data to probability distributions
![Page 20: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/20.jpg)
Problem definition
Dataset D={x1, x2, …, xk} collected from an experiment
Families of distributions:
• Gaussian:
• Exponential:
• Generalized pareto:
Which family of distributions better describes the dataset D?
)}|( ..., ),|( ),|({ 21 Ν21 θθθ xPxPxPS N
,iθ
iθ
,,iθ
![Page 21: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/21.jpg)
Step 1: Maximum likelihood estimation
• For each family i determine parameter that better fits the data
• Maximize likelihood of obtaining the data with respect to
k
j
j
j
k
j
k
xp
xp
xxxp
Dp
1
1
21
)|(lnmaxarg
)|(maxarg
)|,...,,(maxarg
)|(maxarg
iθ
iθ
iθ
iθ
*
i
θ
θ
θ
θθ
i
i
i
i
*
iθ
iθ
Likelihood function
Due to independence of samples
![Page 22: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/22.jpg)
Example: exponential distribution
• Probability density function
• Define the log-likelihood function
• Set derivative equal to 0 to find maximum
k
i
i
k
i
i
k
i
k
i
xxkxel i
1111
)ln()ln()ln()(
k
i
i
k
i
i
x
kx
k
d
dl
1
*
1
00)(
![Page 23: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/23.jpg)
Reform question
After MLE: instead of families we have specific distributions
Which distribution better describes the data?
Choose most appropriate distribution based on:
• Q-Q plots
• Kullback–Leibler divergence
)|( ..., ),|( ),|( **
221 θθθ*
1 xPxPxP N
![Page 24: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/24.jpg)
Method of Q-Q plots
Checks how well a probability distribution describes the data
Algorithm
1. Draw random datasets Υ0, Υ1, Υ2, …, ΥΜ from distribution
2. Compute percentiles of these datasets at predefined set of points
3. Compute percentiles of experimental dataset D at the same points
4. Plot percentiles of Y0 against percentiles of each of Y1, Y2, .., YM
5. Plot percentiles of Y0 against percentiles of dataset D
If plot of step 5 is in the area defined by plots in step 4 the distribution
describes the data well
)|( *θii xP
)|( *θii xP
![Page 25: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/25.jpg)
Plot percentiles of Y0 vs. percentiles of Y1
![Page 26: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/26.jpg)
Plot percentiles of Y0 vs. percentiles of Y2
![Page 27: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/27.jpg)
Plot percentiles of Y0 vs. percentiles of Y100
![Page 28: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/28.jpg)
Construct envelope
![Page 29: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/29.jpg)
Plot percentiles of Y0 vs. percentiles of D
Good fitting: The blue curve of original percentiles lies in the envelope
![Page 30: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/30.jpg)
Plot percentiles of Y0 vs. percentiles of D
Bad fitting: The blue curve of original percentiles lies outside the envelope
![Page 31: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/31.jpg)
Method of Kullback–Leibler divergence
Non-symmetric metric of difference between distributions P and Q
Discrete distributions
Continuous distributions
N
i
KLiq
ipipQPD
1 )(
)(log)()||(
dxxq
xpxpQPDKL
)(
)(log)()||(
![Page 32: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/32.jpg)
Algorithm
1. Discretize the empirical pdf of the Dataset D
2. Discretize all distributions
3. Compute KL divergence of theoretical distributions with dataset D
4. Choose the distribution with the lowest KL divergence
-4 -2 0 2 4
3 samples 6 samples5 samples 2 samples
-3 -1 1 3
3/16
5/166/16
2/16
)|( ..., ),|( ),|( **
221 θθθ*
1 xPxPxP N
![Page 33: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/33.jpg)
Online material
http://www.csd.uoc.gr/~hy439/schedule09.html
• Tutorials Statistics
![Page 34: HY 439 Presented by: George Fortetsanakishy439/project2011/project3/...Poisson process Random process that describes the timestamps of various events • Telephone call arrivals •](https://reader033.vdocuments.mx/reader033/viewer/2022042104/5e819933e6fdc2286467dcc0/html5/thumbnails/34.jpg)
Cross correlation
xcorr(x, y): estimates the cross correlation between two time series x and y
The larger the absolute value of the cross correlation the larger the
correlation of the two variables
][][)( mnnnmnxy yxEyxEmR
White noise Output of IIR filter
No correlation Some correlation