2.000 th 1.800 1.600 1.400 geen 1300 1.200 1.000 …cribme.com/cu/data/computer science/introduction...

1

Class meeting #7Wednesday, Sept 16th

GEEN 1300Introduction toEngineering Computing0.600

0.800

1.000

1.200

1.400

1.600

1.800

2.000

Viscosity (cP)

Viscosity of Water versus Temperature

Spreadsheet Problem Solving general linear regression Polynomial models M ltilinear models

Engineering Computing

0.000

0.200

0.400

20 40 60 80 100 120 140 160 180 200

Temperature (degF)

Note:Section Teston ExcelMonday 9/29

1

Multilinear models Data Analysis Regression Trendline

nonlinear regression using Solver

Homework #4 is posted, due next Wednesday

Monday, 9/297-9 p.m.MATH 100

Example fromlast class

1440

1460

1480

1500

1520

Carbon)

CO2 Emmissions for the US, 1989 ‐ 2000

1320

1340

1360

1380

1400

1420

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

CO2Emissions (M

MT C

Year

2

Use Data Analysis toolpak

2

recall that, if Data Analysisdoes not appear on the Dataribbon, you will need to checkAnalysis Toolpak in the Add-insdialog box [if it’s not there, youwill have to go back to MicrosoftOffice/Excel set-up]Office/Excel set-up]

Initial, emptyRegression

3

Regressiondialog box

Regression dialog box set up for our problem

4

checking Residualswill give us alsomodel predictions

3

Initial (poorly formatted) Regression output display[ on new worksheet ]

Adjust somecolumn widthsand fix updisplay forappropriatesignificantfi

5

figures

Final Display of Regression Output

[ tons of info, most ofwhich you will notunderstand for acouple years ]

used to judgegoodness offit

intercept andslope values

used to judgewhether terms

6

whether terms“belong” in themodel

add to data graphfor visual comparisonwith model

4

Judging Goodnessof Fit

correlation coefficient: if closeto +1 or –1, indicates strongcorrelation between x and y[something we already knowfrom the original graph!]

coefficient of determination:%-age of the variability in ythat’s accounted for by themodel

adjustment to R2 thatpenalizes the value forusing a model with toomany terms

gives an idea of howfar off the modelpredictions will be

7

a y te s

Adjusted R2 or Standard Error can be used to comparedifferent models and choose which fits best. The higherthe value of Adjusted R2 the better, the lower the valueof Standard Error the better.

Judging whether terms belong in the model

P-values estimate the probabilitythat the true value of the coefficientcould be zero

P-values that are quite small, likethese, indicate that there is littlequestion about the significance ofthe term coefficients. In our casehere, that means that both theintercept term and the slope termb l i h d l

A P-value of 5%(0.05) or greatercauses suspicionthat the coefficient

8

belong in the model.may not besignificant and thatthe term shouldprobably be droppedfrom the model

5

The Data Analysis Regression tool appears much morecomplicated and involved that the shortcut Trendline tool, so . . .

Why use Data Analysis Regression?

1) It provides more information that let’s usjudge the goodness of fit and significanceof model terms

2) It can handle model forms that cannot behandled by Trendline

9

So, generally, when using Excel, we preferthe Data Analysis Regression tool over Trendline

but Trendline is still quite good for “quick and dirty”looks at the data

Learn to use both!

More complicated models

Polynomial models2 3y a bx cx dx

General linear models

Note: it is called linear regression,even when there are nonlinearterms in x, because the terms arelinear in the model parameters,a, b, c, etc.

1 2 3 4y a f x b f x c f x d f x

Examples: polynomial models above

1y a b c ln x

x

Multilinear models

10

1 1 2 2 1 2 3 1 2y a f x ,x , b f x ,x , c f x ,x ,

Examples: 1 2 1 2y a bx cx dx x 1

2

x

xy a e

6

Nonlinear models

Transformable to linear

b xy a e ln y ln a b x t i ht li

Not transformable to linear

BA

T CP 10

straight-lineregression!

We can use the Data Analysis Regression tool for everything

10

Blog P A

T C

11

We can use the Data Analysis Regression tool for everythingexcept the nonlinear models that can’t be transformed intolinear. For those, we can use the Solver.

Example: polynomial regression

curvatureevident


0.800

1.000

1.200

1.400

1.600

1.800

2.000

Viscosity (cP

)


12

0.000

0.200

0.400

0.600

20 40 60 80 100 120 140 160 180 200

Temperature (degF)

7

Setting up for polynomial fits

13Select these for a quadratic model, etc

Data Analysis Regression tool

14

check Labels becauseheadings are includedin selections for Y and X

checkResiduals

8

Quadratic model regression results

model performanceadjR2

copy to graph

model coefficients

15

Quadratic model really doesn’t “capture” behavior of data

1.600

1.800

2.000


0.600

0.800

1.000

1.200

1.400

Viscosity (cP)

16

0.000

0.200

0.400

20 40 60 80 100 120 140 160 180 200

Temperature (degF)

9

Plot of residuals vs temperature looks systematic,showing that model is inadequate

1.500E‐01

0.000E+00

5.000E‐02

1.000E‐01

0 50 100 150 200 250

17‐1.000E‐01

‐5.000E‐02

Continue with fits of cubic, 4th- & 5th-order polynomials

Summary of results

Model Order AdjR2 Standard

ErrorIntercept x x2 x3 x4 x5

2 98.05% 0.0663 9.98E-11 1.05E-07 4.07E-063 99 80% 0 0210 2 56E 12 4 62E 09 3 50E 07 5 49E 06

P-values for the model coefficients

Looks like 5th-order offers best performancebut improvement is marginal over 4th-order,so choose 4th-order.

3 99.80% 0.0210 2.56E-12 4.62E-09 3.50E-07 5.49E-064 99.98% 0.0075 2.01E-12 4.19E-09 3.71E-07 6.47E-06 4.47E-055 99.99% 0.0039 6.77E-11 1.02E-07 7.72E-06 1.17E-04 6.97E-04 2.34E-03

18

Resulting model:4 2

6 3 9 4

Visc 3.161 0.05699 T 5.023 10 T

2.162 10 T 3.593 10 T

10

1.600

1.800

2.000


0.600

0.800

1.000

1.200

1.400

Viscosity (cP)

19

0.000

0.200

0.400

20 40 60 80 100 120 140 160 180 200

Temperature (degF)

1.000E‐02

1.500E‐02

Residuals still somewhat patterned with temperature,but much, much smaller

0.000E+00

5.000E‐03

0 50 100 150 200 250

Series1

20‐1.500E‐02

‐1.000E‐02

‐5.000E‐03

11

Using Trendline, instead of Data Analysis Regression

Set for polynomialOrder: 4

21

Display equationon chart

1.600

1.800

2.000


0.600

0.800

1.000

1.200

1.400

Viscosity (cP)

22

y = 3.593E‐09x4 ‐ 2.162E‐06x3 + 5.023E‐04x2 ‐ 5.699E‐02x + 3.161E+00

0.000

0.200

0.400

20 40 60 80 100 120 140 160 180 200

Temperature (degF)

12

Precautions on polynomial fitting

Try to use the lowest-order model that gives a good fit.

Higher-order models will have “wiggles” between datapoints that will cause prediction errorspoints that will cause prediction errors.

In fact, an (n-1)th-order polynomial will provide a perfectfit to the n data points, but it will usually do bizarre thingsin between the data points.

23

Example: multi-linear regressionModel 1: 1 2y a b x c x

X-input range includes

Model 2: 1 2y b x c x

24

two independent variables:x1 and x2

High P value for intercept inModel 1 suggests Model 2without intercept, but thereis a significant loss in adjR2

13

Multilinear Model Performance

8 0

10.0

12.0

Model performance isn’t thatgreat for either model, andModel 1 doesn’t appeardramatically better than Model 2

2 0

4.0

6.0

8.0

Pre

dic

ted

y

Model 1

Model 2

25

0.0

2.0

0 2 4 6 8 10 12

Measured y

Note: for multi-linear models, we plot Predicted vs Measured y.A perfect model would place points directly on the 45-degree line.

Nonlinear Regression

Fitting the parameters of the van der Waals’ equation of stateData for SO2

2

RT aP ˆ ˆV b V

Find the values of a and bthat give the best predictionsfor P, when compared to themeasured values of P

26

14

Strategy for Nonlinear Regression

1) estimate initial values for a and b

2) compute predicted P’s using data for and TV̂

3) compute errors between predicted P’s and measured P’s

4) sum the squares of these errors to compute SSE

5) have the Solver minimize SSEby adjusting the values of a and b

27

Basic data Calculated Pressure

Sum ofsquaresof this

by both ideal gas lawand van der Waals

-column

28

15

Ideal GasCalculation

Sum of SquaresCalculation

van der Waals Calculation

29Error Calculation

Setting up Solver Parameters

SSE as Target CellMinimizeby adjusting a and bwith b> 0 constraintwith b>=0 constraint

Results

30

16

Results

31

Fit of van der Waals Eqn for SO2

and Comparison to Ideal Gas Law

10000000

12000000

Note departure ofideal gas predictionsat higher pressures

4000000

6000000

8000000

Pre

dic

ted

Pre

ssu

re (

Pa)

van der Waals

Ideal Gas

at higher pressures

32

0

2000000

0 2000000 4000000 6000000 8000000 10000000 12000000

Measured Pressure (Pa)

2.000 th 1.800 1.600 1.400 geen 1300 1.200 1.000 …cribme.com/cu/data/computer science/introduction...

Documents