error analysis statistics

33
Slide 1 Error Analysis - Statistics Accuracy and Precision Individual Measurement Uncertainty Distribution of Data Means, Variance and Standard Deviation Confidence Interval Uncertainty of Quantity calculated from several Measurements Error Propagation Least Squares Fitting of Data

Upload: tarun-gehlot

Post on 01-Nov-2014

474 views

Category:

Education


3 download

DESCRIPTION

Error analysis statistics

TRANSCRIPT

Page 1: Error analysis   statistics

Slide 1

Error Analysis - Statistics

• Accuracy and Precision• Individual Measurement Uncertainty

– Distribution of Data– Means, Variance and Standard Deviation– Confidence Interval

• Uncertainty of Quantity calculated from several Measurements– Error Propagation

• Least Squares Fitting of Data

Page 2: Error analysis   statistics

Slide 2

Accuracy and Precision

• AccuracyCloseness of the data (sample) to the “true value.”

• PrecisionCloseness of the grouping of the data (sample) around some central value.

Page 3: Error analysis   statistics

Slide 3

Accuracy and Precision

• Inaccurate & Imprecise • Precise but Inaccurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Page 4: Error analysis   statistics

Slide 4

Accuracy and Precision

• Accurate but Imprecise • Precise and Accurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Page 5: Error analysis   statistics

Slide 5

Accuracy and Precision

Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?

Page 6: Error analysis   statistics

Individual Measurement Statistics

• Take N measurements: X1, . . . , XN

• Calculate mean and standard deviation:

• What to use as the “best value” and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.

• Need to know how data is distributed.

N

iiX

Nx

1

1

N

ixix X

NS

1

22 1

Slide 6

Page 7: Error analysis   statistics

Slide 7

Population and Sample

• Parent PopulationThe set of all possible measurements.

• SampleA subset of the population -measurements actually made.

Population

Bag of Marbles

Handful of marbles from the bag

Samples

Page 8: Error analysis   statistics

Slide 8

Histogram (Sample Based)

• Histogram– A plot of the number of

times a given value occurred.

• Relative Frequency– A plot of the relative

number of times a given value occurred.

Histogram

0

5

10

15

20

25

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Num

ber o

f M

easu

rem

ents

Relative Frequency Plot

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Rel

ativ

e Fr

eque

ncy

Page 9: Error analysis   statistics

Slide 9

• Probability Distribution Function (P(x))

– Probability Distribution Function is the integral of the pdf, i.e.

Q: Plot the probability distribution function vs x.

Q: What is the maximum value of P(x)?

Probability Distribution (Population Based)

• Probability Density Function (pdf) (p(x))– Describes the probability

distribution of all possible measures of x.

– Limiting case of the relative frequency.

xX

dxxpxPx

Probability Density Function

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

x Value (Bin)

Prob

abili

ty p

er u

nit

chan

ge in

x

][ xXPxP Probability that

Page 10: Error analysis   statistics

Slide 10

Ex:

is a probability density function. Find the relationship between A and B.

Probability Density Function

– The probability that a measurement X takes value between (-) is 1.

– Every pdf satisfies the above property.

Q: Given a pdf, how would one find the probability that a measurement is between A and B?

p x dx 1

p xA

xB

12

e

e 2

Hint: - a x dxa

120

Page 11: Error analysis   statistics

Slide 11

• Gaussian (Normal) Distribution

where: x = measured valuex = true (mean) valuex = standard deviationx

2 = variance

Q: What are the two parameters that define a Gaussian distribution?

Common Statistical Distributions

2

2 2 1 e

2

x

x

x

x

p x

Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )

x Value

p x

Page 12: Error analysis   statistics

Slide 12

• Uniform Distribution

where: x = measured valuex1 = lower limitx2 = upper limit

Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?

Common Statistical Distributions

otherwise 0

121

12

xxxxx

xp

x Value

p x

Page 13: Error analysis   statistics

Slide 13

Common Statistical Distributions

Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]

(b) [2.4, 4.02] [V]

Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?

Page 14: Error analysis   statistics

Slide 14

• Standard Deviation (x and Sx )– Characterize the typical deviation of measurements from the mean

and the width of the Gaussian distribution (bell curve).– Smaller x , implies better ______________.

– Population Based

– Sample Based (N samples)

Q: Often we do not know x , how should we calculate Sx ?

Statistical Analysis

x xx p x dx

2

12

N

ixix X

NS

1

21

Page 15: Error analysis   statistics

Slide 15

• Standard Deviation (x and Sx ) (cont.)

Statistical Analysis

Common Name for"Error" Level

Error Level inTerms of

% That the Deviationfrom the Mean is Smaller

Odds That theDeviation is Greater

Standard Deviation 68.3 about 1 in 3

"Two-Sigma Error" 95 1 in 20

"Three-Sigma Error" 99.7 1 in 370

"Four-Sigma Error" 99.994 1 in 16,000

x x x xZ x Z

Page 16: Error analysis   statistics

Slide 16

• Sampled Mean is the best estimate of x .

• Sampled Standard Deviation ( Sx )– Use when x is not available. reduce by one degree of freedom.

Q: If the sampled mean is only an estimate of the “true mean” x , how do we characterize its error?

Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?

Statistical Analysis

x

dxxpxXEx

N

iiX

Nx

1

1

Degree of Freedom

Best Estimate

x

N

iix

N

ixix xX

NSX

NS x

1

2knownnot When

1

2

11 1

Page 17: Error analysis   statistics

Slide 17

Statistical Analysis

Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)

3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1

(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?

Page 18: Error analysis   statistics

Slide 18

• Sampled Mean Statistics– If N is large, will also have a Gaussian distribution. (Central Limit Theorem)

– Mean of :

is an unbiased estimate.

– Standard Deviation of :

is the best estimate of the errorin estimating x .

Q: Since we don’t know x , how would we calculate ?

Confidence Interval

x

x xE x

x

x

x

x

x

N

x

x

x

x

p x( )

p x( )

p x( )

Page 19: Error analysis   statistics

Slide 19

• For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval

Equivalently,

is the Q% Confidence Interval

When x is unknown, Sx will be a reasonable approximation.

Confidence Interval

x

x x xx

N z zQ Q

x

Nx

Nx

xx

x x

z zQ Q

x x

p x

zQ x zQ x

Page 20: Error analysis   statistics

Slide 20

Confidence Interval

Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2

and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.

(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?

Page 21: Error analysis   statistics

Slide 21

• For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.

– with Q% confidence, the true mean x will lie in the following interval about any sampled mean:

t,Q is defined in class notes Chapter 4, Appendix B.

Confidence Interval

x S

Nx S

N

N

x

S

xx

Sx x

t t

where

,Q ,Q

Q% confidence interval

1

Page 22: Error analysis   statistics

Slide 22

Confidence Interval

Ex: A simple postal scale is supplied with ½ , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:

1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01

Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the “true” weight of the 1 oz brass weights?

Page 23: Error analysis   statistics

Slide 23

Propagation of Error

Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?

Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

How do errors propagate through calculations?

Page 24: Error analysis   statistics

Slide 24

• A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through

To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):

The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:

Propagation of Error

212211 , XXfXCXCy

dy

222

1

2

22

2

11

21 xxy CCxXfx

Xfy

Page 25: Error analysis   statistics

Slide 25

• General FormulaSuppose that y is related to n independent measured variables {X1, X2, …, Xn} by a functional representation:

Given the uncertainties of X’s around some operating points:

The expected value of and its uncertainty y are:

Propagation of Error

nXXXfy ,,, 21

x x x x x xn n1 1 2 2 , , ,

nxxx

nn

n

xXfx

Xfx

Xfy

xxxfy

,,,

22

22

2

11

11

11

,,,

y

Page 26: Error analysis   statistics

Propagation of Error

•Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2

= e12 + e2

2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .

E[(y - ytrue)2] = E[e12 + e2

2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]

= E[e12 + e2

2 + . . . + ek2]

y k kE e E e E e 12

22 2

12

22 2

Slide 26

Page 27: Error analysis   statistics

Slide 27

• Example (Standard Deviation of Sampled Mean)Given

Use the general formula for error propagation:

Propagation of Error

NXXXXN

x 3211

N

Xx

Xx

Xx

Xx

xx

xN

xxxx N

22

3

2

2

2

1321

Page 28: Error analysis   statistics

Slide 28

Propagation of Error

Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

KE KEm

m KEv

v

mv mm

mv vv

mv mm

vv

2 2

22

22

22 2

12

2

12

2

Page 29: Error analysis   statistics

Slide 29

• Best Linear Fit–How do we characterize “BEST”?

Fit a linear model (relation)

to N pairs of [xi, yi] measurements.

Given xi, the error between the estimated output and the measured output yi is:

The “BEST” fit is the model that minimizes the sum of the ___________ of the error

Least Squares Fitting of Data

Input X

Out

put Y best linear

fit yest

measured output yi

y a a xi o i 1

y i

n y yi i i

min minn y yi

i=

N

i ii=

N2

1

2

1

Least Square Error

Page 30: Error analysis   statistics

Slide 30

Let

The two independent variables are?

Q: What are we trying to solve?

Least Squares Fitting of Data

J y y y a a xi ii=

N

i o ii=

N

2

11

2

1

M inim ize Find and such that 1J a a dJo 0

Ja

y a a x

o

i o iiN

0

2 011

Ja

x y a a xi i o iiN

0

2 011

Page 31: Error analysis   statistics

Slide 31

Least Squares Fitting of Data

Rewrite the last two equations as two simultaneous equations for ao and a1:

ax y x x y

aN x y x y

N x xo

i i i i i

i i i ii i

2

1

2 2

where

a N a x y

a x a x x y

aa

yx y

o i i

o i i i i

o i

i i

1

12

1

Page 32: Error analysis   statistics

Slide 32

• Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:

where

• The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.

• Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).

Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?

Least Squares Fitting of Data

y a a xi o i 1

a

x y x x y

aN x y x y N x x

oi i i i i

i i i ii i

2

1

2 2

and

Page 33: Error analysis   statistics

Slide 33

Least Squares Fitting of Data

• Variance of the fit:

• Variance of the measurements in y: y2

• Assume measurements in x are precise.• Correlation coefficient:

is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.

RS

n

y

n

y

22

2

2

21 1

,

n N i o iiN y a a x2 1

2 12

1