error analysis statistics

Error Analysis - Statistics

• Accuracy and Precision• Individual Measurement Uncertainty

– Distribution of Data– Means, Variance and Standard Deviation– Confidence Interval

• Uncertainty of Quantity calculated from several Measurements– Error Propagation

• Least Squares Fitting of Data

Accuracy and Precision

• AccuracyCloseness of the data (sample) to the “true value.”

• PrecisionCloseness of the grouping of the data (sample) around some central value.


• Inaccurate & Imprecise • Precise but Inaccurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value


• Accurate but Imprecise • Precise and Accurate

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value

Rel

ativ

e Fr

eque

ncy

X ValueTrue Value


Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?

Individual Measurement Statistics

• Take N measurements: X1, . . . , XN

• Calculate mean and standard deviation:

• What to use as the “best value” and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.

• Need to know how data is distributed.

N

iiX

Nx

1

1

N

ixix X

NS

1

22 1

Slide 6

Population and Sample

• Parent PopulationThe set of all possible measurements.

• SampleA subset of the population -measurements actually made.

Population

Bag of Marbles

Handful of marbles from the bag

Samples

Histogram (Sample Based)

• Histogram– A plot of the number of

times a given value occurred.

• Relative Frequency– A plot of the relative

number of times a given value occurred.

Histogram

0

5

10

15

20

25

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Num

ber o

f M

easu

rem

ents

Relative Frequency Plot

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

X Value (Bin)

Rel

ativ

e Fr

eque

ncy

• Probability Distribution Function (P(x))

– Probability Distribution Function is the integral of the pdf, i.e.

Q: Plot the probability distribution function vs x.

Q: What is the maximum value of P(x)?

Probability Distribution (Population Based)

• Probability Density Function (pdf) (p(x))– Describes the probability

distribution of all possible measures of x.

– Limiting case of the relative frequency.

xX

dxxpxPx

Probability Density Function

0

0.05

0.1

0.15

0.2

0.25

0.3

30 35 40 45 50 55 60 65 70 75 80

x Value (Bin)

Prob

abili

ty p

er u

nit

chan

ge in

x

][ xXPxP Probability that

Ex:

is a probability density function. Find the relationship between A and B.

Probability Density Function

– The probability that a measurement X takes value between (-) is 1.

– Every pdf satisfies the above property.

Q: Given a pdf, how would one find the probability that a measurement is between A and B?

p x dx 1

p xA

xB

12

e

e 2

Hint: - a x dxa

120

• Gaussian (Normal) Distribution

where: x = measured valuex = true (mean) valuex = standard deviationx

2 = variance

Q: What are the two parameters that define a Gaussian distribution?

Common Statistical Distributions

2

2 2 1 e

2

x

x

x

x

p x

Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )

x Value

p x

• Uniform Distribution

where: x = measured valuex1 = lower limitx2 = upper limit

Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?


otherwise 0

121

12

xxxxx

xp

x Value

p x


Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]

(b) [2.4, 4.02] [V]

Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?

• Standard Deviation (x and Sx )– Characterize the typical deviation of measurements from the mean

and the width of the Gaussian distribution (bell curve).– Smaller x , implies better ______________.

– Population Based

– Sample Based (N samples)

Q: Often we do not know x , how should we calculate Sx ?

Statistical Analysis

x xx p x dx

2

12

N

ixix X

NS

1

21

• Standard Deviation (x and Sx ) (cont.)


Common Name for"Error" Level

Error Level inTerms of

% That the Deviationfrom the Mean is Smaller

Odds That theDeviation is Greater

Standard Deviation 68.3 about 1 in 3

"Two-Sigma Error" 95 1 in 20

"Three-Sigma Error" 99.7 1 in 370

"Four-Sigma Error" 99.994 1 in 16,000

x x x xZ x Z

• Sampled Mean is the best estimate of x .

• Sampled Standard Deviation ( Sx )– Use when x is not available. reduce by one degree of freedom.

Q: If the sampled mean is only an estimate of the “true mean” x , how do we characterize its error?

Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?


x

dxxpxXEx

N

iiX

Nx

1

1

Degree of Freedom

Best Estimate

x

N

iix

N

ixix xX

NSX

NS x

1

2knownnot When

1

2

11 1


Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)

3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1

(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?

• Sampled Mean Statistics– If N is large, will also have a Gaussian distribution. (Central Limit Theorem)

– Mean of :

is an unbiased estimate.

– Standard Deviation of :

is the best estimate of the errorin estimating x .

Q: Since we don’t know x , how would we calculate ?

Confidence Interval

x

x xE x

x

x

x

x

x

N

x

x

x

x

p x( )

p x( )

p x( )

• For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval

Equivalently,

is the Q% Confidence Interval

When x is unknown, Sx will be a reasonable approximation.

Confidence Interval

x

x x xx

N z zQ Q

x

Nx

Nx

xx

x x

z zQ Q

x x

p x

zQ x zQ x

Confidence Interval

Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2

and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.

(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?

• For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.

– with Q% confidence, the true mean x will lie in the following interval about any sampled mean:

t,Q is defined in class notes Chapter 4, Appendix B.

Confidence Interval

x S

Nx S

N

N

x

S

xx

Sx x

t t

where

,Q ,Q

Q% confidence interval

1

Confidence Interval

Ex: A simple postal scale is supplied with ½ , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:

1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01

Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the “true” weight of the 1 oz brass weights?

Propagation of Error

Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?

Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

How do errors propagate through calculations?

• A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through

To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):

The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:


212211 , XXfXCXCy

dy

222

1

2

22

2

11

21 xxy CCxXfx

Xfy

• General FormulaSuppose that y is related to n independent measured variables {X1, X2, …, Xn} by a functional representation:

Given the uncertainties of X’s around some operating points:

The expected value of and its uncertainty y are:


nXXXfy ,,, 21

x x x x x xn n1 1 2 2 , , ,

nxxx

nn

n

xXfx

Xfx

Xfy

xxxfy

,,,

22

22

2

11

11

11

,,,

y


•Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2

= e12 + e2

2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .

E[(y - ytrue)2] = E[e12 + e2

2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]

= E[e12 + e2

2 + . . . + ek2]

y k kE e E e E e 12

22 2

12

22 2

Slide 26

• Example (Standard Deviation of Sampled Mean)Given

Use the general formula for error propagation:


NXXXXN

x 3211

N

Xx

Xx

Xx

Xx

xx

xN

xxxx N

22

3

2

2

2

1321


Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?

KE KEm

m KEv

v

mv mm

mv vv

mv mm

vv

2 2

22

22

22 2

12

2

12

2

• Best Linear Fit–How do we characterize “BEST”?

Fit a linear model (relation)

to N pairs of [xi, yi] measurements.

Given xi, the error between the estimated output and the measured output yi is:

The “BEST” fit is the model that minimizes the sum of the ___________ of the error

Least Squares Fitting of Data

Input X

Out

put Y best linear

fit yest

measured output yi

y a a xi o i 1

y i

n y yi i i

min minn y yi

i=

N

i ii=

N2

1

2

1

Least Square Error

Let

The two independent variables are?

Q: What are we trying to solve?


J y y y a a xi ii=

N

i o ii=

N

2

11

2

1

M inim ize Find and such that 1J a a dJo 0

Ja

y a a x

o

i o iiN

0

2 011

Ja

x y a a xi i o iiN

0

2 011


Rewrite the last two equations as two simultaneous equations for ao and a1:

ax y x x y

aN x y x y

N x xo

i i i i i

i i i ii i

2

1

2 2

where

a N a x y

a x a x x y

aa

yx y

o i i

o i i i i

o i

i i

1

12

1

• Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:

where

• The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.

• Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).

Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?


y a a xi o i 1

a

x y x x y

aN x y x y N x x

oi i i i i

i i i ii i

2

1

2 2

and


• Variance of the fit:

• Variance of the measurements in y: y2

• Assume measurements in x are precise.• Correlation coefficient:

is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.

RS

n

y

n

y

22

2

2

21 1

,

n N i o iiN y a a x2 1

2 12

1

error analysis statistics

Education

probability

probability

squares fitting

gaussian distribution

probability

kinetic energy

relative frequency

confidence