2014. engineers often: regress data analysis fit to theory data reduction use the regression of...

20
Regression Statistics 2014

Upload: claude-maxwell

Post on 03-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Regression Statistics

2014

Page 2: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Engineers and Regression

Engineers often:Regress data

Analysis Fit to theory Data reduction

Use the regression of others

Antoine Equation DIPPR

0 2 4 6 8 1012141618-5

0

5

10

15

20

25

X

Y

We need to be able to report uncertainties associated with regression.

• Do the data fit the model?• What are the errors in the

prediction?• What are the errors in the

parameters?

Page 3: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Linear RegressionThere are two classes of regressions

Linear Non-linear

“Linear” refers to the parameters, not the functional dependence of the independent variable

You can use the Mathcad function “linfit” on linear equations

Page 4: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Linear RegressionThere are two classes of regressions

Linear Non-linear

“Linear” refers to the parameters, not the functional dependence of the independent variable

You can use the Mathcad function “linfit” on linear equations

cbxaxy 2

bxaey

543 T

e

T

d

T

c

T

bay

CT

BAy exp

bmxy

Quiz1.

2.

3.

4.

5.

Page 5: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Straight Line Model

0 2 4 6 8 10 12 14 16 18-5

0

5

10

15

20

25 residual (error)

“X” data“Y”

Measured

slope

Intercept

“Y” Predicted

Page 6: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Straight Line Model

residual (error)

iX iY iY ie“X” data

“Y” data

intercept 0.92291455 slope 0.516173934

 

1 2.749032178 1.439088483 1.3099436942 3.719910224 2.362003033 1.3579071923 0.925995017 3.284917582 -2.358922574 2.623482686 4.207832132 -1.584349455 6.539797342 5.130746681 1.4090506616 6.779909177 6.053661231 0.7262479467 4.946150401 6.976575781 -2.030425388 9.674178069 7.89949033 1.7746877399 7.61959821 8.82240488 -1.2028066710 7.650020996 9.745319429 -2.0952984311 11.514 10.66823398 0.84576602112 13.18285068 11.59114853 1.59170215213 13.28173635 12.51406308 0.76767327514 13.60444592 13.43697763 0.1674682915 12.79535218 14.35989218 -1.5645416 17.82374778 15.28280673 2.54094105617 14.55068379 16.20572128 -1.65503748

42.76608602 SSE

“Y” predicted

sum squared error

mean squared error

Number of fitted parameters: 2 for a two-parameter model

Page 7: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

The R2 StatisticA useful statistic but not

definitive

Tells you how well the data fit the model.

It does not tell you if the model is correct.

How much of the distribution of the data about the mean is

described by the model.

Page 8: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Problems with R2

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775

X

Y

3 5 7 9 11 13 15 17 193

5

7

9

11

13f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465

X

Y

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559

X

Y

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484

X

Y

Page 9: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Residuals (ei) should be normally distributed

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.500090909090909 x + 3.00009090909091R² = 0.666542459508775

X

Y1 2 3 4 5 6 7 8 9 10 11

-2.5-2

-1.5-1

-0.50

0.51

1.52

2.5

e

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.499727272727273 x + 3.00245454545455R² = 0.666324041066559

X

Y

1 2 3 4 5 6 7 8 9 10 11

-1.5-1

-0.50

0.51

1.52

2.53

3.5

e

Page 10: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

3 5 7 9 11 13 15 17 193

5

7

9

11

13f(x) = 0.499909090909091 x + 3.00172727272727R² = 0.666707256898465

X

Y1 2 3 4 5 6 7 8 9 10 11

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

e

3 5 7 9 11 13 15 17 193

5

7

9

11

13

f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484

X

Y 1 2 3 4 5 6 7 8 9 10 11

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

e

Residuals (ei) should be normally distributed

Page 11: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Other Questions About Fit

How well does the line fit the data at each point (not just the mean)?

In what range should the data lie (are there outliers)?

What are the confidence intervals on the slope and intercept?

Page 12: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

20

15

10

5

020151050

Coefficient values ± 95% Confidence Intervala =3.0001 ± 2.54b =0.50009 ± 0.267

20

15

10

5

020151050

Coefficient values ± 95% Confidence Intervala =3.0017 ± 2.54b =0.49991 ± 0.267

Confidence Intervals from Example

Green is confidence intervalHow well does the model fit the data?

Red is the prediction bandWhere should the data fall? Can I throw out any points?

Page 13: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Statistics on the Slope/Intercept

When you fit data to a straight line, the slope and intercept are only estimates of the true slope and intercept.

(1-a)100%Confidence Intervals

Standard Errors

(slope)

(intercept)

Page 14: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Confidence Interval on the Prediction & Expected Range of Data

0 2 4 6 8 10 12 14 16 18-5

0

5

10

15

20

25

f(x) = 0.922914602046951 x + 0.516173934846863R² = 0.890424511795483

Data

Excel Trendline

Least Squares Fit

95% CI for Prediction

Expected Range of Data (95%)

X

Y

Given a specific value for x, , what is the error in ?

Confidence Interval on the Prediction

If I take more data, where will the data fall? (Where will be found if measured at ?)

Prediction Band:Expected Range of Data

Page 15: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Confidence Interval vs. Prediction Band

CONFIDENCE INTERVAL

“Mean Prediction Band”

Shows possible errors in where the line will go

Interval narrows with increasing number of data points

See equation on previous slide

PREDICTION BAND

“Single (another) Point Prediction Band”Shows where the data

should lie Interval does not

narrow much with increasing number of data points See equation on previous

slide

Page 16: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Good News:IGOR!

Igor has the equations already programmed and will

• Plot the confidence intervals• Plot the prediction bands• Print the confidence intervals on the slope and

intercept

Assignment

Page 17: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Answers

60

55

50

45

40

35

30

25

Dis

tanc

e (c

m)

1086420

Time (s)

Coefficient values ± 95% Confidence Intervala =30.504 ± 0.766b =2.1468 ± 0.143

Data Curve Fit Confidence Intervals Prediction bands

1.20

1.15

1.10

1.05

1.00

Cp

(kJ

kg-1

K-1

)

1000900800700600500400300

Temperature (K)

Coefficient values ± 95% Confidence IntervalK0 =1.0015 ± 0.0509K1 =7.2738e-005 ± 0.000169K2 =9.4048e-008 ± 1.29e-007

Data Curve Fit Confidence Intervals Prediction Bands

Page 18: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Example Using Excel

Page 19: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Generalized Linear Regression

Linear regression can be written in matrix form. X Y21 18624 21432 28847 42550 45559 53968 62274 67562 56250 45341 37030 274

Straight Line Model

Quadratic Model

Page 20: 2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be

Statistics with Matrices

ParameterConfidence Intervals

Predicted VariableConfidence Intervals

Standard Error of bi is the square root of the i-th diagonal term of the matrix