diploma in statistics introduction to regression lecture 2.21 introduction to regression lecture 2.2...

53
Diploma in Statistics Introduction to Regression Lecture 2.2 1 Introduction to Regression Lecture 2.2 1. Review of Lecture 2.1 Homework Multiple regression Job times case study 2. Job times continued residual analysis model fitting and testing 3. Model fitting and testing procedure 4. t-tests 5. Analysis of Variance

Upload: brian-watkins

Post on 11-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 1

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 2: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 2

Update: Accessing data files

• Access the data in mstuart's get folder:– in ISS Public Access labs, click Start, then

Network Shortcuts, open Get– on your own computer with TCD network

access, navigate to Ntserver-usr / get– once in get, type ms, open mstuart, Diploma

Reg, Excel Data,

or• Access the data on the Diploma web page at

https://www.scss.tcd.ie:453/courses/dipstats/Local/ST7002_0809.php

• Open the relevant Excel file and copy the data

Page 3: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 3

Homework 2.1.1The shelf life of packaged foods depends on many factors. Dry cereal (such as corn flakes) is considered to be a moisture-sensitive product, with the shelf life determined primarily by moisture. In a study of the shelf life of one brand of cereal, packets of cereal were stored in controlled conditions (23°C and 50% relative humidity) for a range of times, and moisture content was measured. The results were as follows.

Draw a scatter diagram. Comment. What action is suggested? Why?

Storage Time

0 3 6 8 10 13 16 20 24 27 30 34 37 41

Moisture Content

2.8 3.0 3.1 3.2 3.4 3.4 3.5 3.1 3.8 4.0 4.1 4.3 4.4 4.9

Page 4: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 4

Draw a scatter diagram. Comment. What action is suggested? Why?

2 exceptional cases; delete and investigate

Storage Time

Mois

ture

Conte

nt

403020100

5.0

4.5

4.0

3.5

3.0

Scatterplot of Moisture Content vs Storage Time

Page 5: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 5

Following appropriate action, the following regression was computed.

The regression equation isMoisture = 2.86 + 0.0417 Storage

Predictor Coef SE Coef T PConstant 2.86122 0.02488 115.01 0.000Storage 0.041660 0.001177 35.40 0.000

S = 0.0493475

Calculate a 95% confidence interval for the daily change in moisture content; show details.

)04401.0,03931.0(00235.004166.0)ˆ(SE2ˆ

Page 6: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 6

Was the action you suggested on studying the scatter diagram in part (a) justified? Explain.

Predict the moisture content of a packet of cereal stored under these conditions for 5 weeks; calculate a prediction interval.

What would be the effect on your interval of not taking the action you suggested on studying the scatter diagram? Why?

Taste tests indicate that this brand of cereal is unacceptably soggy when the moisture content exceeds 4. Based on your prediction interval, do you think that a box of cereal that has been on the shelf for 5 weeks will be acceptable? Explain.

What about 4 weeks? 3 weeks? What is acceptable?

Page 7: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 7

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 8: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 8

Example 5A production prediction problem

Erie Metal Products: The problem

Metal products fabrication:

customers order varying quantities of products of varying complexity;

customers demand accurate and precise order delivery times.

Page 9: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 9

Table 8.1 Times, in hours, to complete jobs with varying numbers of units, numbers of operations per unit and priority status (normal or rushed)

Order Jobtime Units Operations Normal (0)

number (hours) per unit or Rushed (1)? 1 153 100 6 0

2 192 35 11 0 3 162 127 7 1 4 240 64 12 0 5 339 600 5 1 6 185 14 16 1 7 235 96 11 1 8 506 257 13 0 9 260 21 9 1

10 161 39 8 0 11 835 426 14 0 12 586 843 6 0 13 444 391 8 0 14 240 84 13 1 15 303 235 9 1 16 775 520 12 0 17 136 76 8 1 18 271 139 11 1 19 385 165 14 1 20 451 304 10 0

Erie Metal Products: The data

Page 10: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 10

The multiple linear regression model

Jobtime =

Units × Units

Ops × Ops

T_Ops × T_Ops

Rushed × Rushed

Page 11: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 11

Model parameters

The regression coefficients:

Units, Ops, T_Ops, Rushed

The "uncertainty" parameter:

standard deviation of

Page 12: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 12

Regression of Jobtime on other variables

Predictor Coef SE Coef T PConstant 77.24 44.76 1.73 0.105Units -0.1507 0.1121 -1.34 0.199Ops 7.152 4.305 1.66 0.117T_Ops 0.11460 0.01322 8.67 0.000Rushed -24.94 19.11 -1.31 0.211

S = 37.4612

Page 13: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 13

Homework

Predict job times for

small (U=100, O=5),

medium (U=300, O=10) and

large (U=500, O=15) jobs,

both normal and rushed.

Present the results in tabular form.

Page 14: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 14

Homework Solution

Page 15: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 15

Are these predictions useful?

What is S?

What is 2S?

When will my order arrive?

NEXT

Diagnostics; analysis of residuals

Page 16: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 16

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 17: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 17

Checking model fit

Assumptions:

explanatory variables are adequate

error term ():

variation is Normal

variation is stable

Check via residuals

Response = Fit + Residual

Page 18: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 18

Regression diagnostics

• The diagnostic plot, 'deleted' residuals vs fitted values

– checking for homogeneity of error

• The Normal residual plot,

– checking the Normal model

Page 19: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 19

Residuals

Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete.

Prediction

Jobtime = 77 – 0.15 × 21 + 7.1 × 9 + 0.11 × 189 – 25

= 135,

Residual = 260 – 135 = 125

Page 20: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 20

Deleted residuals

Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete.

Deleted prediction, regression with case 9 deleted:

Jobtime = 42 – 0.08 × 21 + 10 × 9 + 0.11 × 189 - 38

= 113,

DeletedResidual = 260 – 113 = 147

Standardised deleted residual ≈ DR / s = 147 / 14

= 10.5

Page 21: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 21

Deleted residuals

• Residual

– observed – fitted

• Standardised Residual

– using an estimate of based on current data

• Standardised Deleted Residual

– calculated from data with suspect case deleted

– estimated from data with suspect case deleted

Page 22: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 22

The Diagnostic Plot

200 300 400 500 600 700 800

Fitted values

-2

0

2

4

6

8

10

Deletedresiduals

Page 23: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 23

Scatterplot of artificial datawith a highly exceptional case

NB: exceptionally large Y value corresponds to small X value

Page 24: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 24

Scatter plot and diagnostic plotfor artificial data

-1 0 1

Fitted values

-2

0

2

4

6

8

10

Deletedresiduals

-2 -1 0 1 2

X

-2

-1

0

1

2

Y

Page 25: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 25

Normal plot of residuals

-2 -1 0 1 2

Normal scores

-2

0

2

4

6

8

10

Deletedresiduals

Page 26: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 26

Statistical AnalysisSection 8.4

Iterating the analysis

• Revising the fit

– revised prediction formula

– revised diagnostics

• A further iteration

Page 27: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 27

Revised fit, case 9 deletedThe regression equation isJobtime = 41.7 – 0.0835 Units + 10.0 Ops + 0.110

T_Ops – 38.2 Rushed

19 cases used, 1 cases contain missing values

Predictor Coef SE Coef T PConstant 41.72 16.87 2.47 0.027Units -0.08349 0.04186 -1.99 0.066Ops 10.022 1.612 6.22 0.000T_Ops 0.110016 0.004891 22.49 0.000Rushed -38.217 7.166 -5.33 0.000

S = 13.7952

Page 28: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 28

Revised fitExercise

Predict job times for small (U=100, O=5),

medium (U=300, O=10) and

large (U=500, O=15) jobs,

normal and rushed.

Page 29: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 29

Revised predictions

Table 8.3 Original and revised predicted job times for small, medium and large jobs

Small Medium Large Original 155 447 969 Normal Revised 138 447 975

Original 130 422 944 Rushed Revised 100 409 937

Page 30: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 30

Recall scatter plot for artificial data

-2 -1 0 1 2

X

-2

-1

0

1

2

Y

Page 31: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 31

Revised diagnostics, case 9 deleted

-2 -1 0 1 2

Normal scores

-4

-2

0

2

4

6

Deletedresiduals

200 300 400 500 600 700 800

Fitted values

-4

-2

0

2

4

6

Deletedresiduals

Page 32: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 32

Revised fit, cases 9, 11, 16 deleted

The regression equation isJobtime = 44.2 – 0.0693 Units + 9.83 Ops + 0.108 T_Ops

– 38.0 Rushed

17 cases used, 3 cases contain missing values

Predictor Coef SE Coef T PConstant 44.216 9.080 4.87 0.000Units –0.06931 0.02853 –2.43 0.032Ops 9.8286 0.8873 11.08 0.000T_Ops 0.107795 0.004114 26.20 0.000Rushed –37.960 3.857 –9.84 0.000

S = 7.41272

Page 33: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 33

Revised diagnostics, cases 9, 11, 16 deleted

-2 -1 0 1 2

Normal scores

-3

-2

-1

0

1

2

3

Deletedresiduals

200 300 400 500 600

Fitted values

-3

-2

-1

0

1

2

3

Deletedresiduals

Page 34: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 34

Coefficient estimates from three fits

Coefficient Units Ops T_Ops Rushed

Original fit 77 –0.15 7.2 0.11 –25 Revised fit 42 –0.08 10 0.11 –38 Final fit 44 –0.07 9.8 0.11 –38

Final s.e. 9 0.03 0.9 0.004 4

Page 35: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 35

Homework 2.2.1

Extend table of predictions of small medium and large jobs to include predictions based on the final fit.

Compare and contrast.

Page 36: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 36

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 37: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 37

The model fitting and testing procedure

• Step 1: Initial data analysis:

• Step 2: Least squares fit and interpretation:

• Step 3: Diagnostic analysis of residuals:

• Step 4: Iterate fit and check:

Page 38: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 38

Step 1: Initial data analysis

• standard single variable summaries

– to determine extent of variation

– possible exceptional values;

• scatter plot matrix

– to view pair wise relationships between the response and the explanatory variables

and– to view pair wise relationships between the

explanatory variables themselves.

Page 39: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 39

Step 2: Least squares fit and interpretation

• calculate the best fitting regression coefficients

– check meaningfulness and statistical significance;

• calculate s

– check its usefulness for prediction

– its usefulness relative to alternative estimates of standard deviation.

Page 40: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 40

Step 3: Diagnostic analysis of residuals

• diagnostic plot

– check for exceptional residuals or patterns of residuals,

– possible explanations in terms of the fitted values;

• Normal plot

– check for exceptional residuals or non-linear patterns in the residuals

Page 41: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 41

Step 4: Iterate fit and check

• determine cases for deletion

– repeat steps 2 and 3 until checks are passed.

Page 42: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 42

Homework 2.2.2You have been asked to comment, as a statistical consultant, on a prediction formula for forecasting job completion times prepared by a former employee. The formula is, effectively, the one derived from the first fit discussed above. Write a report for management. Your report should refer to

(i) the practical usefulness of the employee's prediction formula, from a customer's perspective,

(ii) the significance of the exceptional cases from the customer's and management's perspectives, and

(iii) your recommended formula, with its relative advantages.

Page 43: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 43

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 44: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 44

t-tests

First fit

The regression equation isJobtime = 77.2 – 0.151 Units + 7.15 Ops + 0.115 T_Ops

– 24.9 Rushed

Predictor Coef SE Coef T PConstant 77.24 44.76 1.73 0.105Units –0.1507 0.1121 –1.34 0.199Ops 7.152 4.305 1.66 0.117T_Ops 0.11460 0.01322 8.67 0.000Rushed –24.94 19.11 –1.31 0.211

S = 37.4612

Page 45: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 45

Revised fit, case 9 deletedThe regression equation isJobtime = 41.7 – 0.0835 Units + 10.0 Ops + 0.110

T_Ops – 38.2 Rushed

19 cases used, 1 cases contain missing values

Predictor Coef SE Coef T PConstant 41.72 16.87 2.47 0.027Units -0.08349 0.04186 -1.99 0.066Ops 10.022 1.612 6.22 0.000T_Ops 0.110016 0.004891 22.49 0.000Rushed -38.217 7.166 -5.33 0.000

S = 13.7952

Page 46: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 46

Revised fit, cases 9, 11, 16 deleted

The regression equation isJobtime = 44.2 – 0.0693 Units + 9.83 Ops + 0.108 T_Ops

– 38.0 Rushed

17 cases used, 3 cases contain missing values

Predictor Coef SE Coef T PConstant 44.216 9.080 4.87 0.000Units –0.06931 0.02853 –2.43 0.032Ops 9.8286 0.8873 11.08 0.000T_Ops 0.107795 0.004114 26.20 0.000Rushed –37.960 3.857 –9.84 0.000

S = 7.41272

Page 47: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 47

Homework 2.2.3

Make a table of the t values and corresponding s values for the three regressions

Compare, contrast and explain.

Page 48: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 48

Introduction to RegressionLecture 2.2

1. Review of Lecture 2.1

– Homework– Multiple regression– Job times case study

2. Job times continued

– residual analysis– model fitting and testing

3. Model fitting and testing procedure

4. t-tests

5. Analysis of Variance

Page 49: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 49

Analysis of Variance

S = 7.41272 R-Sq = 99.8% R-Sq(adj) = 99.7%

Analysis of Variance

Source DF SS MS F PRegression 4 299165 74791 1361.12 0.000Residual Error 12 659 55Total 16 299824

Residual Mean Square = s2: check!

Page 50: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 50

Analysis of Variance

Regression Sum of Squares measuresexplained variation

Residual Sum of Squares measuresunexplained (chance) variation

Total Variation = Explained + Unexplained

Check it!

%Total

ExplainedR2

Page 51: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 51

Analysis of Variance

Regression Sum of Squares measuresexplained variation

Residual Sum of Squares measuresunexplained (chance) variation

Total Variation = Explained + Unexplained

F = MS(Reg) / MS(Res)

with 4 and 12 degrees of freedom.

Check it! Check F tables.

Page 52: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 52

Reduction in Prediction Error

No fit prediction error: sNo fit = sY = 202

1st fit prediction error: s1st fit = 37.5, less by factor of 5.4

2nd fit prediction error: s2nd fit = 13.8, less by factor of 2.7

3rd fit prediction error: s3rd fit = 7.4, less by factor of 1.9

Page 53: Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression

Diploma in StatisticsIntroduction to Regression

Lecture 2.2 53

Reading

SA §§ 8.2 - 8.6, § 1.6

Extra Notes: Degrees of Freedom

R2 and Adjusted R2

(Further Interpretation of the Correlation Coefficient)