1 probability and statistics what is probability? what is statistics?
Post on 29-Dec-2015
328 Views
Preview:
TRANSCRIPT
1
Probability and Statistics
What is probability?What is statistics?
2
Probability and StatisticsProbability
Formally defined using a set of axioms Seeks to determine the likelihood that a given
event or observation or measurement will or has happened
What is the probability of throwing a 7 using two dice?
Statistics Used to analyze the frequency of past events Uses a given sample of data to assess a
probabilistic model’s validity or determine values of its parameters
After observing several throws of two dice, can I determine whether or not they are loaded
Also depends on what we mean by probability
3
Probability and StatisticsWe perform an experiment to collect a
number of top quarks How do we extract the best value for its
mass? What is the uncertainty of our best value? Is our experiment internally consistent? Is this value consistent with a given
theory, which itself may contain uncertainties?
Is this value consistent with other measurements of the top quark mass?
4
Probability and StatisticsCDF “discovery” announced
4/11/2011
5
Probability and Statistics
6
Probability and Statistics
Pentaquark search - how can this occur?
2003 – 6.8 effect 2005 – no effect
7
ProbabilityLet the sample space S be the space of
all possible outcomes of an experimentLet x be a possible outcome
Then P(x found in [x,x+dx]) = f(x)dx f(x) is called the probability density function
(pdf) It may be called f(x;) since the pdf could
depend on one or more parameters Often we will want to determine from a set of
measurements Of course x must be somewhere so
1
dxxf
8
ProbabilityDefinitions of mean and variance are
given in terms of expectation values
2222
xExExExV
dxxxfxE
9
ProbabilityDefinitions of covariance and correlation
coefficient
0,cov so and
then and
t then independeny x,if
,cov
,cov
yx
dxdyx,yxyfx,yE
yfxfx,yf
yx
xyEyxEyxV
yx
yx
yxxy
yxyxxy
10
ProbabilityError propagation
n
jiij
xji
ii
x
n
i i
jiij
n
Vx
y
x
yyxyE
yxyE
xx
yyxy
xxVxE
xxxxxy
1,
22
1
21
find we
expanding TSthen
,cov and and
,..., ables with variConsider
11
ProbabilityThis gives the familiar error propagation
formulas for sums (or differences) and products (or ratio)
21
2122
22
21
21
2
2
21
2122
21
2
21
222
,cov2
for and
,cov2
for find we
Using
xx
xx
xxy
xxy
xx
xxy
yEyEyV
y
y
y
12
Uniform DistributionLet
What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?
2
2
2
12
11
2
1
2
1
otherwise
for
0
1),;(
dxxxV
xExExV
dxx
dxxxfxE
βxαxf
13
Binomial DistributionConsider N independent experiments
(Bernoulli trials)Let the outcome of each be pass or failLet the probability of pass = p
nNn
nNn
ppnNn
NpNnf
N
nNn
N
pp
1 !!
!),;(
timeaat n themgrouping objects, habledistinguis
for spermuation !!
! are But there
1 successesn ofy Probabilit
14
PermutationsQuick review
n
N
nN-n
N
N-n
n
n
NN
!!
! is spermuation unique ofnumber
toleads nspermutatio irrelevant for these accounting Thus
typesecond theof elements remaining for the Ditto
situation same the tolead elements theof !
so ishableindistingu are first type theof elements But
habledistinguiselement each considers This
! elementsfor nspermutatio ofNumber
15
Binomial DistributionFor the mean and variance we
obtain (using small tricks)
And note with the binomial theorem that
pNpnEnEnV
NppNnnfnEN
n
1
,;
22
0
N
n
NnNnN
n
ppppn
NpNnf
00
111),;(
16
Binomial Distribution
Binomial pdf
17
Binomial Distribution
Examples Coin flip (p=1/2) Dice throw (p=1/6) Branching ratio of nuclear and
particle decays (p=Br) Detector or trigger efficiencies (pass
or not pass) Blood group B or not blood group B
18
Binomial Distribution It’s baseball season! What is the
probability of a 0.300 hitter getting 4 hits in one game?
hitter 0.300 afor hits 0.841.2Expect
84.07.03.04
2.13.04
0081.07.03.0!0!4
!43.0,4;4
3.0,4
04
nV
nE
f
pN
19
Poisson DistributionConsider when
vnV
vnE
en
vvnf
vNpnE
p
N
vn
2
finds one and!
;
is pdf PoissonThe
0
20
Poisson Distribution
pNn
evep
n
NpNnf
eepN
Npp
nNnNpnNpp
NnN
N
ppnNn
NpNnf
vnvn
n
vNpnN
nN
n
nNn
small and largefor !!
,;
...!2
11
...!2
111
largeN for !
!
1!!
!,;
22
2
21
Poisson DistributionPoisson pdf
22
Poisson DistributionExamples
Particles detected from radioactive decays Sum of two Poisson processes is a Poisson process
Particles detected from scattering of a beam on target with cross section
Cosmic rays observed in a time interval t Number of entries in a histogram bin when
data is accumulated over a fixed time interval
Number of Prussian soldiers kicked to death by horses
Infant mortality QC/failure rate predictions
23
Poisson DistributionLet
271.0!2
2)2;2(
271.0!1
2)2;1(
135.0!0
2)2;0(
and
/2 Then
10210
11Let
atoms 10Let
22
21
20
2012
20
eP
eP
eP
sNpv
/s~yτ
p
N~
24
Gaussian DistributionGaussian distribution
Important because of the central limit theorem For n independent variables x1,x2,…,xN that
are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N
Examples are almost any measurement error (energy resolution, position resolution, …)
ii
ii
yV
yE
25
Gaussian DistributionThe familiar Gaussian pdf is
2
2
2
2 2exp
2
1,;
xV
xE
xxf
26
Gaussian Distribution Some useful properties of the Gaussian
distribution are P(x in range ±) = 0.683 P(x in range ±2) = 0.9555 P(x in range ±3) = 0.9973 P(x outside range ±3) = 0.0027 P(x outside range ±5) = 5.7x10-7
P(x in range ±0.6745) = 0.5
27
2 DistributionChi-square distribution
d.o.f. withondistributi thefollows
,,t independen
for that is pdf thisof usefulness The
2
zE
freedom of degrees ofnumber theis ,...2,1
02/2
1;
2
12
2
2
2/12/2/
n
xz
xn
nzV
n
n
zezn
nzf
n
i i
ii
iii
znn
28
2 Distribution
29
Probability
30
ProbabilityProbability can be defined in terms of
Kolmogorov axioms The probability is a real-valued function
defined on subsets A,B,… in sample space S
This means the probability is a measure in which the measure of the entire sample space is 1
1
,0 If
0 in subset every For
SP
BPAPBAPBA
AS, PA
31
ProbabilityWe further define the conditional
probability P(A|B) read P(A) given B
Bayes’ theorem
BP
BAPBAP
|
BP
APABPBAP
ABPBAP
||
Using
32
ProbabilityFor disjoint Ai
Usually one treats the Ai as outcomes of a repeatable experiment
iii
iii
APABP
APABPBAP
then
APABPBP
|
||
|
33
ProbabilityUsually one treats the Ai as outcomes of
a repeatable experiment Then P(A) is usually assigned a value equal to
the limiting frequency of occurrence of A
Called frequentist statisticsBut Ai could also be interpreted as
hypotheses, each of which is true or false Then P(A) represents the degree of belief that
hypothesis A is true Called Bayesian statistics
n
AAP
n lim
34
Bayes’ TheoremSuppose in the general population
P(disease) = 0.001 P(no disease) = 0.999
Suppose there is a test to check for the disease P(+, disease) = 0.98 P(-, disease) = 0.02
But also P(+, no disease) = 0.03 P(-, no disease) = 0.97
You are tested for the disease and it comes back +. Should you be worried?
35
Bayes’ TheoremApply Bayes’ theorem
3.2% of people testing positive have the disease
Your degree of belief about having the disease is 3.2%
032.0999.003.0001.098.0
001.098.0),(
) () ,()(),(
)(),(),(
diseaseP
diseasenoPdiseasenoPdiseasePdiseaseP
diseasePdiseasePdiseaseP
36
Bayes’ Theorem Is athlete A guilty of drug doping?Assume a population of athletes in this sport
P(drug) = 0.005 P(no drug) = 0.995
Suppose there is a test to check for the drug P(+, drug) = 0.99 P(-, drug) = 0.01
But also P(+, no drug) = 0.004 P(-, no drug) = 0.996
The athlete is tested positive. Is he/she involved in drug doping?
37
Bayes’ TheoremApply Bayes’ theorem
???
45.0005.099.0995.0004.0
995.0004.0), (
)(),() () ,(
) () ,(), (
and
55.0995.0004.0005.099.0
005.099.0),(
) () ,()(),(
)(),(),(
drugnoP
drugPdrugPdrugnoPdrugnoP
drugnoPdrugnoPdrugnoP
drugP
drugnoPdrugnoPdrugPdrugP
drugPdrugPdrugP
38
Binomial DistributionCalculating efficiencies
Usually use instead of p
efficiency theonerror theis This
/111
estimateour use can but we trueknow thet don' We
12
NnnNNN
NnV
N
n
N
nENnE
39
Binomial DistributionBut there is a problem
If n=0, (’) = 0 If n=N, (’) = 0
Actually we went wrong in assuming the best estimate for is n/N We should really have used the most
probable value of given n and NA proper treatment uses Bayes’
theorem but lucky for us (in HEP) the solution is implemented in ROOT h_num->Sumw2() h_den->Sumw2() h_eff->Divide(h_num,h_den,1.0,1.0,”B”)
top related