error analysis lectures

Upload: lev-dun

Post on 14-Apr-2018

255 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 error analysis lectures

    1/47

    Introduction to Error AnalysisPart 1: the Basics

    Andrei Gritsanbased on lectures by Petar Maksimovic

    February 1, 2010

    Overview

    Definitions

    Reporting results and rounding

    Accuracy vs precision systematic vs statistical errors Parent distribution

    Mean and standard deviation

    Gaussian probability distribution What a 1 error means

  • 7/27/2019 error analysis lectures

    2/47

    Definitions true: true value of the quantity x we measure xi: observed value error on : difference between the observed and true

    value, xi trueAll measurement have errors true value is unattainable

    seek best estimate of true value,

    seek best estimate of true error xi

  • 7/27/2019 error analysis lectures

    3/47

    One view on reporting measurements (from the book) keep only one digit of precision on the error everything

    else is noise

    Example: 410.5163819

    4

    102

    exception: when the first digit is 1, keep two:Example: 17538 1.7 104

    round off the final value of the measurement up to thesignificant digits of the errorsExample: 87654 345 kg (876 3) 102 kg

    rounding rules:

    6 and above

    round up

    4 and below round down 5: if the digit to the right is even round down, else round

    up

    (reason: reduces systematic erorrs in rounding)

  • 7/27/2019 error analysis lectures

    4/47

    A different view on roundingFrom Particle Data Group (authority in particle physics):

    http://pdg.lbl.gov/2009/reviews/rpp2009-rev-rpp-intro.pdf

    between 100 and 354, we round to two significant digitsExample: 87654 345 kg (876.5 3.5) 102 kg

    between 355 and 949, we round to one significant digitExample: 87654 365 kg (877 4) 10

    2

    kg lie between 950 and 999, we round up to 1000 and keep two

    significant digits

    Example: 87654

    950 kg

    (87.7

    1.0)

    103 kg

    Bottom line:

    Use consistent approach to rounding which is sound andaccepted in the field of study, use common sense after all

  • 7/27/2019 error analysis lectures

    5/47

    Accuracy vs precision Accuracy: how close to true value Precision: how well the result is determined (regardless

    of true value); a measure of reproducibility

    Example: = 30

    x = 23 2 precise, but inacurate uncorrected biases(large systematic error)

    x = 28 7 acurate, but imprecise

    subsequent measurements will scatter around

    = 30 but cover the true value in most cases(large statistical (random) error)

    an experiment should be both acurate and precise

  • 7/27/2019 error analysis lectures

    6/47

    Statistical vs. systematic errors Statistical (random) errors:

    describes by how much subsequent measurementsscatter the common average value

    if limited by instrumental error, use a better apparatus if limited by statistical fluctuations, make more

    measurements

    Systematic errors: all measurements biased in a common way harder to detect: faulty calibrations

    wrong model bias by observer

    also hard to determine (no unique recipe) estimated from analysis of experimental conditions and

    techniques

    may be correlated

  • 7/27/2019 error analysis lectures

    7/47

    Parent distribution(assume no systematic errors for now)

    parent distribution: the probability distribution of results ifthe number of measurements N

    however, only a limited number of measurements: weobserve only a sample of parent dist., a sample distribution

    prob. distribution of our measurements only approaches

    parent dist. with N use observed distribution to infer the parameters from theparent distribution, e.g., true when N

    Notation

    Greek: parameters of the parent distribution

    Roman: experimental estimates of params of parent dist.

  • 7/27/2019 error analysis lectures

    8/47

    Mean, median, mode Mean: of experimental (sample) dist:

    x 1

    N

    Ni=1

    xi

    . . . of the parent dist

    limN

    1

    N

    xi

    mean centroid average

    Median: splits the sample in two equal parts

    Mode: most likely value (highest prob.density)

  • 7/27/2019 error analysis lectures

    9/47

    Variance Deviation: di xi , for single measurement

    Average deviation:

    xi = 0 by definition |xi |but, absolute values are hard to deal with analytically

    Variance: instead, use mean of the deviations squared:2 (xi )2 = x2 2

    2 = limN

    1

    N

    x2i

    2

    (mean of the squares minus the square of the mean)

  • 7/27/2019 error analysis lectures

    10/47

    Standard deviation Standard deviation: root mean square of deviations:

    2 = x

    2

    2

    associated with the 2nd moment of xi distribution

    Sample variance: replace by x

    s2 1N 1

    (xi x)2

    N 1 instead of N because x is obtained from the samedata sample and not independently

  • 7/27/2019 error analysis lectures

    11/47

    So what are we after? We want . Best estimate of is sample mean, x x Best estimate of the error on x (and thus on is square root

    of sample variance, s

    s2

    Weighted averages

    P(xi) discreete probability distribution

    replacexi withP(xi)xi andx2i byP(xi)x2i by definition, the formulae using are unchanged

  • 7/27/2019 error analysis lectures

    12/47

    Gaussian probability distribution unquestionably the most useful in statistical analysis

    a limiting case of Binomial and Poisson distributions (whichare more fundamental; see next week)

    seems to describe distributions of random observations fora large number of physical measurements

    so pervasive that all results of measurements are alwaysclassified as gaussian or non-gaussian (even on WallStreet)

  • 7/27/2019 error analysis lectures

    13/47

    Meet the Gaussian probability density function: random variable x

    parameters center and width :pG(x; , ) =

    1

    2exp

    1

    2

    x

    2

    Differential probability:probability to observe a value in [x, x + dx] is

    dPG(x; , ) = pG(x; , )dx

  • 7/27/2019 error analysis lectures

    14/47

    Standard Gaussian Distribution:replace (x )/ with a new variable z:

    pG(z)dz =1

    2exp

    z2

    2 dz

    got a Gaussian centered at 0 with a width of 1.All computers calculate Standard Gaussian first, and then

    stretch it and shift it to makepG(x; , )

    mean and standard deviationBy straight application of definitions:

    mean = (the center) standard deviation = (the width)

    This makes Gaussian so convenient!

  • 7/27/2019 error analysis lectures

    15/47

    Interpretation of Gaussian errors we measured x = x0 0; what does that tell us? Standard Gaussian covers 0.683 from 1.0 to +1.0

    the true value of x is contained by the interval[x0 0, x0 + 0] 68.3% of the time!

  • 7/27/2019 error analysis lectures

    16/47

    The Gaussian distribution coverage

  • 7/27/2019 error analysis lectures

    17/47

    Introduction to Error AnalysisPart 2: Fitting

    Overview

    principle of Maximum Likelihood minimizing 2 linear regression fitting of an arbitrery curve

  • 7/27/2019 error analysis lectures

    18/47

    Likelihood observed N data points, from parent population assume Gaussian parent distribution (mean , std.

    deviation )

    probability to observe xi, given true ,

    Pi(xi|, ) =1

    2

    exp 1

    2

    xi

    2

    probability to have measured in this single measurement,

    given observed xi and i, is called likelihood:

    Pi(|xi, i) = 1

    2exp

    1

    2

    xi

    2

    for N observations, total likelihood is

    L P() = Ni=1Pi()

  • 7/27/2019 error analysis lectures

    19/47

    Principle of Maximum Likelihood maximizing P() gives as the best estimate of

    (the most likely population from which data might have come isassumed to be the correct one)

    for Gaussian individual probability distributions(i = const = )

    L = P() =

    1

    2

    Nexp

    12

    xi

    2

    maximizing likelihood

    minimizing argumen of Exp.

    2 xi

    2

  • 7/27/2019 error analysis lectures

    20/47

    Example: calculating mean cross-checking...

    d2

    d =

    d

    dxi

    2

    = 0

    derivative is linear

    xi

    = 0

    = x 1N

    xi

    The mean really is the best estimate of the measured quantity.

  • 7/27/2019 error analysis lectures

    21/47

    Linear regression simplest case: linear functional dependence measurements yi, model (prediction) y = f(x) = a + bx

    in each point, y(xi) = a + bxi( special casef(xi) = const = a = )

    minimize

    2(a, b) =

    yi f(xi)

    2 conditions for a minimum in 2-dim parameter space:

    a2(a, b) = 0

    b2(a, b) = 0

    can be solved analytically, but dont do it in real life(e.g. see p.105 in Bevington)

  • 7/27/2019 error analysis lectures

    22/47

    Familiar example from day one: linear fit A program (e.g. ROOT) will do minimization of 2 for you

    x label [units]1 2 3 4 5

    ylabel[uni

    ts]

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    example analysisexample analysis

    This program will give you the answer for a and b(= )

  • 7/27/2019 error analysis lectures

    23/47

    Fitting with an arbitrary curve a set of measurement pairs (xi, yi i)

    (note no errors onxi!)

    theoretical model (prediction) may depend on severalparameters {ai} and doesnt have to be lineary = f(x; a0, a1, a2, ...)

    identical approach: minimize total 2

    2(

    {ai

    }) =

    yi f(xi; a0, a1, a2, ...)

    i2

    minimization proceeds numerically

  • 7/27/2019 error analysis lectures

    24/47

    Fitting data points with errors on both x and y xi xi , yi yi Each term in 2 sum gets a correction from the xi

    contribution:yi f(xi)yi

    2 (yi f(xi))2

    (yi )2 +

    f(xi+xi)f(xi

    xi)

    2 2

    x

    y(x)

    f(xsx)

    error on x

    contributionto y chi2

    f(x)f(x+sx)

  • 7/27/2019 error analysis lectures

    25/47

    Behavior of 2

    function near minimum when N is large, 2(a0, a1, a2, . . .) becomes quadratic

    in each parameter near minimum

    2 = (aj a

    j)2

    2j+ C

    known as parabollic approximation

    C tells us about goodness of the overall fit (function of alluncertaintines + other {ak} for k = j

    aj = j 2 = 1valid in all cases! parabollic error is the curvature at the minimum

    22

    a2j

    =2

    2j

  • 7/27/2019 error analysis lectures

    26/47

    2

    shapes near minimum: examples

    better errorsworse fit

    a

    2

    asymmetric errorsbetter fit

  • 7/27/2019 error analysis lectures

    27/47

    Two methods for obtaining the error1.

    2j =

    222

    a2

    j

    2. scan each parameter around minimum while others are fixed

    until 2 = 1 is reached

    method #1 is much faster to calculate method #2 is more generic and works even when the shapeof 2 near minimum is not exactly parabollic

    the scan of

    2

    = 1defines a so-called one-sigma contour.

    It contains the truthwith 68.3% probability(assuming Gaussian errors)

  • 7/27/2019 error analysis lectures

    28/47

    What to remember In the end the fit will be done for you by the program

    you supply the data, e.g. (xi, yi)

    and the fit model, e.g. y = f(x; a1, a2, ...)

    the program returns a1 = 1 1, a2 = 2 2,...and the plot with the model line through the points

    You need to understand what is done

    In more complex cases you may need to go deep into code

  • 7/27/2019 error analysis lectures

    29/47

    Introduction to Error AnalysisPart 3: Combining measurements

    Overview

    propagation of errors covariance weighted average and its error error on sample mean and sample standard deviation

  • 7/27/2019 error analysis lectures

    30/47

    Propagation of Errors x is a known function of u, v. . .x = f(u,v,...)

    assume that most probable value for x isx = f(u, v, ...)

    x is the mean of xi = f(ui, vi,...)

    by definition of variance

    x = limN

    1

    N(xi x)2

    expand (xi x) in Taylor series:

    xi

    x

    (ui

    u)

    x

    u + (vi

    v)x

    v +

  • 7/27/2019 error analysis lectures

    31/47

    Variance of x

    2x

    limN

    1

    N(ui

    u)x

    u + (vi

    v)x

    v +

    2

    limN

    1

    N

    (ui u)2

    x

    u

    2+ (vi v)2

    x

    v

    2

    +2(ui u)(vi v)

    xu

    xv

    +

    2ux

    u2

    + 2v xv2

    + 22uv xux

    v +

    This is the error propagation equation.

    uv is COvariance. Describes correlation between errors on uand

    v.

    For uncorrelated errors uv 0

  • 7/27/2019 error analysis lectures

    32/47

    Examples x = u + awhere a =const. Thus x/u = 1

    x

    = u

    x = au + bvwhere a, b =const.

    2x = a

    22u + b

    22v + 2ab

    2uv

    correlation can be negative, i.e. 2uv < 0

    if an error on u counterballanced by a proportional erroron v, x can get very small!

  • 7/27/2019 error analysis lectures

    33/47

    More examples x = auv

    x

    u = av

    x

    v = au

    2x = (avu)

    2 + (auv)2 + 2a2uv2uv

    2x

    x2 =

    2u

    u2 +

    2v

    v2 + 2

    2uv

    uv2

    x = auv

    2x

    x2=

    2u

    u2+

    2v

    v22

    2uv

    uv2

    etc., etc.

  • 7/27/2019 error analysis lectures

    34/47

    Weighted averageFrom part #2: calculation of the mean

    minimizing

    2

    =xi

    i2

    minimum at d2/d = 0, but now i = const.

    0 =

    xi

    2i

    =

    xi2i

    12i

    so-called weighted average is

    =

    xi2

    i

    12

    i each measurement is weightedby 1/2i !

  • 7/27/2019 error analysis lectures

    35/47

    Error on weighted average N points contribute to a weighted average straight application of the error propagation equation:

    2 =

    2i

    xi

    2

    xi

    =

    xi

    (xj/

    2

    j )(1/2k)

    = 1/2

    i(1/2k)

    putting both together

    1

    2

    =

    2i

    1/2i

    (1/2k)

    21= 1

    2k

  • 7/27/2019 error analysis lectures

    36/47

    Example of weighted average

    x1 = 25.0 1.0 x2 = 20.0 5.0 error

    2 =1

    1/1 + 1/52= 25/26 0.96 1.0

    weighted average

    x = 2 25

    12+

    20

    52 =25

    2625 25 + 20 1

    25= 24.8

    result: x = 24.8 1.0 morale: x2 practically doesnt matter!

  • 7/27/2019 error analysis lectures

    37/47

    Error on the mean

    N measurements from the same parent population (, ) from part #1: sample mean and sample standard

    deviation are best estimators of the parent population

    but: more measurements still gives same : our knowledge of shape of parent population improves

    and thus of original true error on each point

    but how well do we know the true value? (i.e. ?)

    if N points from same population with :1

    2 = 1

    2 =

    N

    2

    =N

    sN

    Standard deviation of the mean, or standard error.

  • 7/27/2019 error analysis lectures

    38/47

    Example: Lightness vs Lycopene content, scatter plot

    Lightness

    30 40 50 60

    Lycopene

    content

    50

    60

    70

    80

    90

    100

    110

    120lyc:Light

  • 7/27/2019 error analysis lectures

    39/47

    Example: Lightness vs Lycopene content: RMS as Error

    Lightness

    30 35 40 45 50 55

    Lycopene

    content

    60

    65

    70

    75

    80

    85

    90

    95100

    105

    Lightness vs Lycopene content -- spread option

    Points dont scatter enough the error bars are too large!

  • 7/27/2019 error analysis lectures

    40/47

    Example: Lightness vs Lycopene content: Error on Mean

    Lightness

    30 35 40 45 50 55

    Lycopene

    content

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Lightness vs Lycopene content

    This looks much better!

  • 7/27/2019 error analysis lectures

    41/47

    Introduction to Error AnalysisPart 4: dealing with non-Gaussian cases

    Overview

    binomial p.d.f. Poisson p.d.f.

  • 7/27/2019 error analysis lectures

    42/47

    Binomial probability density function

    random process with exactly two outcomes (Bernoulliprocess)

    probability for one outcome (success) isp

    probability for exactly r successes (0 r N) in Nindependent trials

    order of successes and failures doesnt matter

    binomial p.d.f.:f(r; N, p) =

    N!

    r!(N r)!pr(1 p)Nr

    mean: N p variance: N p(1 p)

    if r and s are binomially distributed with (Nr, p) and

    (Ns, p), then t = r + s distributed with (Nr + Ns, p).

  • 7/27/2019 error analysis lectures

    43/47

    Examples of binomial probability

    Binomial distribution always shows up when data exhibitsbinary properties:

    event passes or fails

    efficiency (an important exp. parameter) defined as

    = Npass/Ntotal

    particles in a sample are positive or negative

  • 7/27/2019 error analysis lectures

    44/47

    Poisson probability density function

    probability of finding exactly n events in a given interval ofx (e.g., space and time)

    events are independent of each other and of x

    average rate of per interval Poisson p.d.f. ( > 0)

    f(n; ) =

    ne

    n!

    mean:

    variance:

    limiting case of binomial for many events with lowprobability:

    p 0, N while N p =

    Poisson approaches Gaussian for large

  • 7/27/2019 error analysis lectures

    45/47

    Examples of Poisson probability

    Shows up in counting measurements with small number ofevents

    number of watermellons with circumference

    c [19.5, 20.5]in. nuclear spectroscopy in tails of distribution

    (e.g. high channel number)

    Rutherford experiment

  • 7/27/2019 error analysis lectures

    46/47

    Fitting a histogram with Poisson-distributed content

    Poisson data require special treatment in terms of fitting!

    histogram, i-th channel contains ni entries

    for large ni, P(ni) is GaussianPoisson =

    approximated by

    ni

    (WARNING: this is what ROOT uses by default!)

    Gaussian p.d.f.

    symmetric errors

    equal probability to fluctuate up or down minimizing 2 fit (the true value) is equally likely to be

    above and below the data!

  • 7/27/2019 error analysis lectures

    47/47

    Comparing Poisson and Gaussian p.d.f.

    Expected number of events, x-2 0 2 4 6 8 10 12 14 16 18

    Prob(5|x)

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    Expected number of events, x0 2 4 6 8 10 12 14

    CummulativeProbability

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    5 observed events Dashed: Gaussian at 5 with = 5 Solid: Poisson with = 5 Left: prob.density functions (note: Gauss can be < 0!) Right: confidence integrals (p.d.f integrated from 5)