statistics - folk.uio.nofolk.uio.no/erikadl/fys4550/are/lectures_statistics_h16.pdfstatistics •...
TRANSCRIPT
![Page 1: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/1.jpg)
LECTURE NOTES
FYS 4550/FYS9550 -
EXPERIMENTAL HIGH ENERGY PHYSICS
AUTUMN 2016
STATISTICS
A. STRANDLIE
NTNU AT GJØVIK
AND
UNIVERSITY OF OSLO
![Page 2: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/2.jpg)
Statistics
• Statistics is about making inference about a statistical model, given
a set of data or measurements
– Parameters of a distribution
– Parameters describing the kinematics of a particle after a collision
• Position and momentum at some reference surface
– Parameters describing an interaction vertex (position, refined estimates
of particle momenta)
• Will consider two issues
– Parameter estimation
– Hypothesis tests and confidence intervals
![Page 3: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/3.jpg)
Statistics
• Parameter estimation
• We want to estimate the unknown value of a parameter θ.
• An estimator is a function of the data which aims to estimate the
value of θ as closely as possible.
• General estimator properties
– Consistency
– Bias
– Efficiency
– Robustness
• A consistent estimator is an estimator which converges to the true
value of θ when the amount of data increases (formally, in the limit
of infinite amount of data).
^
![Page 4: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/4.jpg)
Statistics
• The bias b of an estimator is given as
• Since the estimator is a function of the data, it is itself a random
variable with its own distribution.
• The expectation value of θ can be interpreted as the mean value of
the estimate for a very large number of hypothetical, identical
experiments.
• Obviously, unbiased (i.e. b=0) estimators are desirable.
^
Eb
![Page 5: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/5.jpg)
Statistics
• The efficiency of an estimator is the inverse of the ratio of its
variance to the minimum possible value.
• The minimum possible value is given by the Rao-Cramer-Frechet
lower bound
where I(θ) is the Fisher information:
)(
1
2
2
min
I
b
2
);(lnE)(i
ixfI
![Page 6: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/6.jpg)
Statistics
• The sum is over all the data, which are assumed independent and to follow
the pdf f(x; θ).
• The expression of the lower bound is valid for all estimators with the same
bias function b(θ) (for unbiased estimators b(θ) vanishes).
• If the variance of the estimator happens to be equal to the Cramer-Rao-
Frechet lower bound, it is called a minimum variance lower bound estimator
or a (fully) efficient estimator.
• Different estimators of the same parameter can also be compared by
looking at the ratios of the efficiencies. One then talks about relative
efficiencies.
• Robustness is the (qualitative) degree of insensitivity of the estimator to
deviations in the assumed pdf of the data
– e.g. noise in the data not properly taken into account
– wrong data
– etc
![Page 7: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/7.jpg)
Statistics
• Common estimators for the mean and variance are (often called the
sample mean and the sample variance):
• The variances of these are:
N
i
i
N
i
i
xxN
s
xN
x
1
22
1
1
1
1
4
4
2
2
1
31)(
)(
N
Nm
NsV
NxV
![Page 8: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/8.jpg)
Statistics
• For variables which obey the Gaussian distribution, this yields for
large N
• For Gaussian variables the sample mean is a fully efficient
estimator.
• If the different measurements used in the calculation of the sample
mean have different variances, a better estimator of the mean is a
weighted sample mean:
Nsstd
2)(
i
i
i
ii
w
xx
2
2
1
1
![Page 9: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/9.jpg)
Statistics
• The method of maximum likelihood:
• Assume that we have N independent measurements all obeying the
pdf f(x;θ), where θ is a parameter vector consisting of n different
parameters to be estimated.
• The maximum likelihood estimate is the value of the parameter
vector θ which maximizes the likelihood function
• Since the natural logarithm is a monotoneously increasing function,
ln(L) and L will have maximum for the same value of θ.
);(1
θθ
N
i
ixfLθ
![Page 10: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/10.jpg)
Statistics
• Therefore the maximum likelihood estimate can be found by solving the likelihood equations
for all i=1,…..,n.
• ML-estimators are asymptotically (i.e. for large amounts of data) unbiased and fully efficient
– Therefore very popular
• An estimate of the inverse of the covariance matrix of an ML-estimate is
evaluated at the estimated value of θ.
0ln
i
L
ji
ij
LV
ln2
1
![Page 11: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/11.jpg)
Statistics
• The method of least squares.
• Simplest possible example: estimating the parameters of a straight
line (intercept and tangent of inclination angle) given a set of
measurements.
measurements
fitted line
![Page 12: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/12.jpg)
Statistics
• Least-squares approach: minimizing the sum of squared distances S
between the line and the N measurements,
with respect to the parameters of the line (i.e. a and b).
• This cost function or objective function S can be written in a more
compact way by using matrix notation:
N
i i
ii baxyS
12
2))((
variance of measurement
error
θyθy HVHST
1
![Page 13: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/13.jpg)
Statistics
• Here y is a vector of measurements, θ is a vector of the parameters
a and b, V is the (diagonal) covariance matrix of measurements
(consisting of the individual variances on the main diagonal), and H
is given by
• Taking the derivative of S with respect to θ, setting this to zero and
solving for θ yields the least-squares solution to the problem.
Nx
x
H
1
1 1
![Page 14: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/14.jpg)
Statistics
• The result is:
• The covariance matrix of the estimated parameters is:
and the covariance matrix of the estimated positions is
yθ111 VHHVH TT
11cov HVH T
θ
TT HHVHH11covy
θy
H
![Page 15: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/15.jpg)
Statistics
Simulating 10000 lines
Histogram of value of
estimated intercept
What is true value
of intercept?
![Page 16: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/16.jpg)
Statistics
Simulating 10000 lines
Histogram of value of
tangent of angle of
inclination
What is true value?
![Page 17: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/17.jpg)
StatisticsHistograms of normalized residuals of estimated parameters.
This means that for each fitted line and each estimated parameter,
a quantity ((estimated parameter-true parameter)/standard deviation
of parameter) is put into the histogram.
If everything is OK with the fitting procedure, these histograms should
have mean 0 and standard deviation 1.
mean=-0.0189
std=1.0038
mean=0.0157
std=1.0011
![Page 18: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/18.jpg)
Statistics
• Least-squares estimation is for instance used in track fitting in high-
energy physics experiments.
• Track fitting is basically the same task as the line fit example:
estimating a set of parameters describing a particle track through a
tracking detector, given a set of measurements created by the
particle.
• In the general case the track model is not a straight line but rather a
helix (homogeneous magnetic field) or some other trajectory
obeying the equations of motion in an inhomogeneous magnetic
field.
• The principles of the fitting procedure, however, are largely the
same.
![Page 19: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/19.jpg)
Statistics
• As long as there is a linear relationship between the parameters and the
measurements, the least-squares method is linear.
• If this relationship is a non-linear function F(θ), the problem is said to be of a
non-linear least-squares type:
• There exists no direct solution to this problem, and one has to resort to an
iterative approach (Gauss-Newton):
– Start out with an initial guess of θ, linearize function F around the initial guess by
a Taylor expansion and solve the resulting linear least-squares problem
– Use the estimated value for θ as a new expansion point for F and repeat the step
above
– Iterate until convergence (i.e. until θ changes less than a specified value from
one iteration to the next)
)()( 1θyθy FVFS
T
![Page 20: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/20.jpg)
Statistics
• Relationship between maximum likelihood and least-squares:
• Consider a set of independent measurements y with mean values
F(x;θ).
• If these measurements follow a Gaussian distribution, the log-
likelihood function is basically
plus some terms which do not depend on θ.
• Maximizing the log-likelihood function is in this case equivalent to
minimizing the least-squares objective function.
N
i i
ii xFyL
12
2;
)(ln2
![Page 21: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/21.jpg)
Statistics
• Confidence intervals and hypothesis tests.
• Confidence intervals:
– Given a set of measurements of a parameter, calculate an interval that one can be e.g. 95 % sure that the true value of the parameter is within
– Such an interval is called a 95 % confidence interval of a parameter
• Example: collect N measurements believed to come from a Gaussian distribution with unknown mean value μ and known standard deviation σ. Use the sample mean value to calculate a 100(1-α) % confidence interval for μ.
• From earlier: the sample mean is an unbiased estimator for μ with standard deviation σ/sqrt(N).
• For large enough N, the quantity is distributed according to a standard, normal distribution
(mean value 0, standard deviation 1)
N
XZ
/
![Page 22: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/22.jpg)
Statistics
• Therefore:
• In words, there is a probability 1-α that the true mean is in the interval
• This interval is therefore a 100(1- α) % confidence interval for μ.
• Such intervals are highly relevant in physics analysis.
1//
1//
1//
1/
2/2/
2/2/
2/2/
2/2/
NzXNzXP
NzXNzP
NzXNzP
zN
XzP
NzXNzX /,/ 2/2/
![Page 23: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/23.jpg)
Statistics
• Hypothesis tests:
• A hypothesis is a statement about the distribution of a vector x of
data.
• Similar to the previous example:
– given a number N measurements, test whether the measurements
come from a normal distribution with a certain expectation value μ or
not.
– define a test statistic, i.e. the quantity to be used in the evaluation of the
hypothesis. Here: the sample mean.
– define the significance level of the test, i.e. the probability that the
hypothesis will be discarded even though it is true.
– determine the critical region of the test statistic, i.e. interval(s) of values
of the test statistic which will lead to the rejection of the hypothesis
![Page 24: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/24.jpg)
Statistics
• We then state two competing hypotheses:
– A null hypothesis, stating that the expectation value is equal to a given
value
– An alternative hypothesis, stating that the expectation value is not equal
to the given value
• Mathematically:
• Test statistic:
01
00
:
:
H
H
N
XZ
/
0
![Page 25: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/25.jpg)
Statistics
2/z 2/z
Probability of being in shaded
area: α
Shaded area is therefore the
critical region of Z for
significance level αObtain a value of the test
statistic from test data by
calculating the sample mean
and transforming to Z.
Use the actual value of Z to
determine whether the null
hypothesis is rejected or not.
![Page 26: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/26.jpg)
Statistics
• Alternatively: perform the test by calculating the so-called p-value of the test statistic.
• Given the actual value of the test statistic, what is the area below the pdf for the range of values of the test statistic starting from the actual one and extending to all values further away from the value defined by the null hypothesis? This area defines the p-value. – For the current example this would correspond to adding two integrals
of the pdf of the test statistic (because this is a so-called two-sided test):• one from minus infinity to minus the absolute value of the actual value of the
test statistic
• another from the absolute value of the actual value of the test statistic to plus infinity
• For a one-sided test one would stick to one integral of the abovementioned type
• If the p-value is less than the significance level: discard the null hypothesis. If not, don’t discard it.
![Page 27: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/27.jpg)
Statistics
• p-values can be used in so-called goodness-of-fit tests.
• In such tests one frequently uses a test statistic which is assumed to
be chisquare distributed
– Is a measurement in a tracking detector compatible with belonging to a
particle track defined by a set of other measurements?
– Is a histogram with a set of entries in different bins compatible with an
expected histogram (defined by an underlying assumption of the
distribution)?
– Is the residual distributions of estimated parameters compatible with the
estimated covariance matrix of the parameters?
• If one can calculate many independent values of the test statistic,
the following procedure is often applied:
– Calculate the p-value of the test statistic each time the test statistic is
calculated
![Page 28: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/28.jpg)
Statistics
– The p-value itself is also a random variable, and it can be shown that it
is distributed according to a uniform distribution if the test statistic
origins from the expected (chisquare) distribution.
– Create a histogram with the various p-values as entries and see
whether it looks reasonably flat
• NB! With only one calculated p-value, the null hypothesis can be
rejected but never confirmed!
• With many calculated p-values (as immediately above) the null
hypothesis can also (to a certain extent) be confirmed!
• Example: line fit (as before)
• For each fitted line, calculate the following chisquare:
θθθθθ 12 cov
T
![Page 29: STATISTICS - folk.uio.nofolk.uio.no/erikadl/FYS4550/are/Lectures_Statistics_H16.pdfStatistics • The sum is over all the data, which are assumed independent and to follow the pdf](https://reader034.vdocuments.mx/reader034/viewer/2022051600/5aad91dd7f8b9aa9488e7839/html5/thumbnails/29.jpg)
Statistics
• Here θ is the true value of the parameter vector.
• For each value of the chisquare, calculate the corresponding p-value
– Integral of chisquare distribution from the value of the chisquare to
infinity
• Given in tables or in standard computer programs (CERNLIB, CLHEP,
MATLAB,….)
• Fill up a histogram with the p-values and make a plot:
Reasonably flat histogram, seems OK.
What we really test here is that the estimated
parameters are unbiased estimates of the true
parameters, distributed according to
a Gaussian with a covariance matrix as
obtained in the estimate!!