typing equations in ms word 2010 - michigan technological …fmorriso/cm3215/lectures/cm3215... ·...

CM3215 Statistics 5: Linear Regression and LINEST (Faith A. Morrison)

1/26/2016

1

CM3215

Fundamentals of Chemical Engineering Laboratory

Professor Faith Morrison

Department of Chemical EngineeringMichigan Technological University

© Faith A. Morrison, Michigan Tech U.

1

https://www.youtube.com/watch?v=ceNp9meHTmY

Typing Equations in

MS Word 2010

© Faith A. Morrison, Michigan Tech U.2

Where are we in our discussion of error

analysis?

Let’s revisit:


1/26/2016

2


3

Summary: Error Analysis with Real Numbers

• To understand the accuracy of our numbers, we need to determine a confidence interval.

2 with 95.0% confidence

• The Standard error for a measured quantity is the largest of: determined by replicates / or

by estimate of reading error / 3 orby estimate of calibration error maxerror/2

• Standard error for derived quantities (arrived at from equations), is

obtained at through error propagation,which is a combination of variances.

For replicate data with 7, replace “2” with . ,

From Lecture 4—Error Propagation:


4

Error Propagation

, , . . .

We use an analysis based on the Taylor series expansion of a nonlinear function.

Taylor series:

A calculation of the function , , from uncertain values of , , is a

random variable of mean and variance :

Covariance terms, if are correlated

(higher order terms)



1/26/2016

3


5

Error Propagation

, , . . .

We use an analysis based on the Taylor series expansion of a nonlinear function.

Taylor series:

Covariance terms, if are correlated

neglect

Note: covariance terms are not always zero or small; but they often are. For now, this is fine.


(higher order terms)

A calculation of the function , , from uncertain values of , , is a

random variable of mean and variance :


6

Worksheet for error

propagation

www.chem.mtu.edu/~fmorriso/cm3215/ErrorPropagationWorksheet.pdf



1/26/2016

4


30.800

13.410

10.00

1/

1/

/

5.8 10

0.02

3.3 10 /

1.21 10 /

1.21 10 /

0.0035 /

1.739 / 1.739 0.007 /

Example 1:What is the uncertainty (95% confidence interval) in as

determined in the lab?

5.8 10 3.3 10 /



8

Example 1:What is the uncertainty (95% confidence interval) in as

determined in the lab?

f(x 1 ,x 2 ,x 3 ) f BF 1.739 g/ml 2es 0.007 g/ml

xi value df/dxi (df/dxi)2 exi exi

2 (df/dxi)2exi

2

x1 MF 30.800 g 0.10 0.010 5.8E‐05 3.3E‐09 3.33E‐11 g2/ml2

x2 ME 13.410 g ‐0.10 0.010 5.8E‐05 3.3E‐09 3.33E‐11 g2/ml2

x3 Vpyc 10.000 ml ‐0.174 0.0302 0.02 4.0E‐04 1.210E‐05 g2/ml2

es2 1.21E‐05 g2/ml2

es 0.0035 g/ml

Error propagation Worksheet

Excel is an excellent tool for error propagation



1/26/2016

5

68

Summary: Error Analysis with Real Numbers

• To understand the accuracy of our numbers, we need to determine a confidence interval.

with 95.0% confidence

• Standard error for derived quantities (arrived at from equations), is obtained through

error propagation, which is a combination of variances.

For replicate data with , replace “2” with

• Replication improves the estimation of the mean. The answer from replicates is more reliable than single values (if no systematic errors).

• The weighting values indicate the impact of individual errors on the final value.

• The prediction interval of the next value of x should encompass 95% of all measured values.

• The Standard error for a measured quantity is the sum, in quadrature, of: determined by replicates

by estimate of reading errorby estimate of calibration error

• Estimates for (particularly those obtained through ) may need to be re‐evaluated, if unreasonably narrow confidence intervals are identified.

95% PI: or if


© F

aith

A.

Mor

rison

, M

ichi

gan

Tech

U.

9


10

Now, how do we determine uncertainty from numbers

that we obtain as parameters in a curve‐fit?


1/26/2016

6

CM3215

Fundamentals of Chemical Engineering Laboratory



Uncertainty in Least Squares Curve Fitting: Excel’s LINEST


Reference: www.chem.mtu.edu/~fmorriso/cm3215/Unc

ertaintySlopeInterceptOfLeastSquaresFit.pdf

1. Quick start—Replicate error2. Reading Error3. Calibration Error4. Error Propagation5. Least Squares Curve Fitting


12

12⋮ ⋮ ⋮

Question: For a dataset of data pairs , that is expected to show a linear

relationship between and , what are the parameters and of the equation for the line?

slopeintercept

Ordinary, Least Squares, LinearRegression


1/26/2016

7


13


Solution:

• Assume you know the with certainty (“ordinary”least squares)

• Guess a line, • Create a measure of the error between the guess

and the data (error measure should always be positive, so square it)

• Add these individual error measures to calculate a sum of squared errors,

• Use calculus (derivatives) to find the values of and that result in the least sum of squared error.

slopeintercept

≡

12⋮⋮⋮⋮

data line


14


Result:

slopeintercept

∑ ∑ ∑

∑ ∑

∑ ∑ ∑ ∑

∑ ∑

Least squares slope

Least squares intercept

12⋮⋮⋮⋮

In Excel: SLOPE(y‐range, x‐range)INTERCEPT(y‐range,x‐range)


1/26/2016

8


15


Result:

slopeintercept

∑ ∑ ∑

∑ ∑

∑ ∑ ∑ ∑

∑ ∑

Least squares slope


12⋮⋮⋮⋮

In Excel: SLOPE(y‐range, x‐range)INTERCEPT(y‐range,x‐range)

and are calculated from the

,

These are the formulas used in Excel trendlines.


16


slopeintercept

Least squares slope


12⋮⋮⋮⋮

But, what are the error

limits on

and ?

Result:

∑ ∑ ∑

∑ ∑

∑ ∑ ∑ ∑

∑ ∑


1/26/2016

9


17


slopeintercept

12⋮⋮⋮⋮


limits on

and ?

slope ?Intercept ?


18


slopeintercept

12⋮⋮⋮⋮


limits on

and ?

slope ?Intercept ?

slope 2 Intercept 2

Answer:

But what is ?

(Later we will correct the “2” for small )


1/26/2016

10

© F

aith

A.

Mor

rison

, M

ichi

gan

Tech

U.

19


Answer:


Ordinary, Least Squares, Linear Regression Answer:

2

⋮ ⋮

,

,

,

,

⋮

,

⋮

,

,

,

∑ ∑ ∑

∑ ∑

Error limits on


1/26/2016

11


Answer:

2

⋮ ⋮

,

,

,

,

⋮

,

⋮

,

,

,

∑ ∑ ∑

∑ ∑

Only the are variables; we assumed we knew the with certainty

Error limits on Ordinary, Least Squares, Linear Regression


Answer:

2

⋮ ⋮

,

,

,

,

⋮

,

⋮

,

,

,

∑ ∑ ∑

∑ ∑

Assume that the variances of the are the same for all .


( , is the standard

deviation of at a given value of


1/26/2016

12


23


slopeintercept

, ≡12

The variance of , given

,

The variance of the mean value of at a given

(This formula comes from the definition

of variance)

In Excel: • , STEYX(y‐range, x‐range), or• use LINEST

( , is the standard deviation of

at a given value of ; ordinary least squares assumes it is constant)


24


slopeintercept

, ≡12

The variance of , given

,

In Excel: • , STEYX(y‐range, x‐range), or• use LINEST

The variance of the mean value of at a given

( , is the standard deviation of

at a given value of ; ordinary least squares assumes it is constant)

Best value of at a given

, is calculated

from the ,

(This formula comes from the definition

of variance)


1/26/2016

13


25


slopeintercept

What are the error limits on ?

slope 2

,

Answer:

In Excel:

•STEYX(y−range, x−range

(DEVSQ(x−range) , or

• use LINEST

for 2 6 :slope . ,

(This is the final result of the algebra indicated on the error propagation slide)



slopeintercept


intercept 2

Answer:

?Solve the same way, error

propagation on the formula for


1/26/2016

14


Answer:

2

⋮ ⋮

,

,

,

,

⋮

,

⋮

,

,

,

∑ ∑ ∑ ∑

∑ ∑




slopeintercept


intercept 2

,1

Answer:

In Excel:

• Calculate from STEYX(y−range, x−range) and DEVSQ(x−range) and the formula above, or

• use LINEST

for 2 6 :intercept . ,

(This is the final result of the algebra indicated on the error propagation slide)


1/26/2016

15

29


slopeintercept

www.chem.mtu.edu/~fmorriso/cm3215/UncertaintySlopeInterceptOfLeastSquaresFit.pdf


For instructions on how to use Microsoft Excel’s LINEST

function, see the handout on the web:

(the appendix has some derivations, if you’re interested)



slopeintercept

What are the error limits on a value of obtained from the equation ?

At a chosen ,

2

?


1/26/2016

16


Answer:

2

1

0 0



Answer:

2

1

0 0

But, and are not independent (both are calculated from the ).

Ordinary, Least Squares, Linear RegressionError limits on


1/26/2016

17


Answer:

2

1

0 0

2 Cov ,

Ordinary, Least Squares, Linear RegressionError limits on



slopeintercept

What are the error limits on a value of obtained from the equation ?

at , 2

,1

Answer:

In Excel: • , STEYX(y−range,x−range)

• DEVSQ(x−range)• AVERAGE(x−range)

for 2 6 ,replace “2” with . ,

(This is the final result of the algebra indicated on previous slide; see Appendix B of the handout.)

Use this for error limits on values obtained from the fit.


1/26/2016

18

at , we predict a new measurement of

will fall in the prediction interval:

2



slopeintercept

What are the error limits on a predicted next experimental value of ?

?

Answer:



2



slopeintercept

?

Answer:

Solve with same approach as we have been using: write the equation to calculate the quantity,

then propagate the error.

(See Appendix B of the handout.)



1/26/2016

19



slopeintercept

for 2 6 ,replace “2” with . ,

(See Appendix B of the handout.)



2

, 11

Answer:




Prediction interval of data:

,

0.80

0.90

1.00

1.10

1.20

1.30

1.40

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0

density, g/m

l

wt % sugar

Aqueous Sugar Solutions, 20oC, 2014

CM3215 Fall 2014 data

+95%CI

‐95%CI

trendline

‐95%PI

‐95%PI

(Notice that 95% of the data points fall within the PI; that’s what it means to be a PI. The next data point likely will fall here too.)

(for large , the values of at each are well predicted (CI is narrow))

Confidence interval for values from the fit:

,


1/26/2016

20



0.80

0.90

1.00

1.10

1.20

1.30

1.40

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0

density, g/m

l

wt % sugar

Aqueous Sugar Solutions, 20oC, 2014

CM3215 Fall 2014 data

+95%CI

‐95%CI

trendline

‐95%PI

‐95%PI

Note: if your data are replicates (data taken repeatedly at chosen values), do not pre‐average the ‐data and follow‐up with a least‐squares curve fit. Instead, use all the replicates as individual values, and let LINEST find the least squared error.


40

Summary: Uncertainty Ordinary, Least Squares, Linear Regression

• The Ordinary Least Squares Linear Regressionmethod provides the equations needed to obtain model parameters slope and intercept.

• The equations for the parameters may be used with error propagation to obtain the variances associated with the parameters and .

95% confidence intervals on the parameters are constructed with 2 for large

For 2 6, the 95% CI is constructed as . ,

• We can construct 95% CI on the best values of at a chosen . These CI are used for error range on the fit.

• We can construct 95% prediction intervals (PI) on a next value of at a chosen ; use to evaluate next experimental point acquired.

slopeintercept


1/26/2016

21


41

Excel Summary: Uncertainty Ordinary, Least Squares, Linear Regression

• AVERAGE(range)

• VAR.S(range)

• STDEV.S(range)

• COUNT(range)

• DEVSQ(x‐range)

• SLOPE(y‐range, x‐range)

• INTERCEPT(y‐range,x‐range)

• , STEYX(y‐range, x‐range)

• LINEST (see handout)

• LOGEST (look it up)

slopeintercept

• ,

• ,

• ,

• , 1

Use for CI error bars on ‐values obtained from a fit

Use for PI of next

measured value of


42

Excel Handy List: Uncertainty Ordinary, Least Squares, Linear Regression

• TREND(known‐y’s, known‐x’s, ) for and related by

• GROWTH(known‐y’s, known‐x’s, ) for and related by

slopeintercept


1/26/2016

22


43

One final piece of advice:Uncertainty Ordinary, Least Squares, Linear Regression

slopeintercept

Often, you can transform your data to make it linear, allowing you to use linear regression. For example, if you know the ‐data vary as the square root of the ‐data, then

will be linear. If data plotted with log‐log scaling (using scatterplot) look quadratic, then

will be quadratic, and we can use trendline to obtain a fit:

Transforming data can greatly broaden our ability to fit empirical models to data.

versus

log versus log

log log log




Done!


1/26/2016

23


Comment on Curve Fitting: Coefficient of Determination,

Which data set has a larger ?

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data



Which data set has a larger ?

y = 2.00832x + 6.42864R² = 0.86297

y = 0.0117x + 6.8857R² = 0.0053

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data


1/26/2016

24



From page 6:

is a measure of the comparison of the hypothesized linear relationship

and the relationshiopconstant (horizontal line). So, if

it is a horizontal line, will be zero.


Which is the correct fit?

y = 2.005x + 6.6572R² = 0.9458

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data

y = ‐0.0106x4 + 0.2386x3 ‐ 1.7052x2 + 6.1052x + 4.4855R² = 0.9625

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data


1/26/2016

25


Which is the correct fit?

y = 2.005x + 6.6572R² = 0.9458

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data

y = ‐0.0106x4 + 0.2386x3 ‐ 1.7052x2 + 6.1052x + 4.4855R² = 0.9625

0.0

5.0

10.0

15.0

20.0

25.0

30.0

0.00 2.00 4.00 6.00 8.00 10.00 12.00

y‐data

x‐data

• (it depends on the error bars)• Likely that the linear fit is a “truer” relationship to be

used for interpolation

typing equations in ms word 2010 - michigan technological …fmorriso/cm3215/lectures/cm3215... ·...

Documents