ng bb 37 multiple regression

59
UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training Module 37 Multiple Regression

Upload: leanleadersorg

Post on 14-Jan-2015

2.796 views

Category:

Education


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

National GuardBlack Belt Training

Module 37

Multiple Regression

Page 2: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

CPI Roadmap – Analyze

Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.

TOOLS•Value Stream Analysis•Process Constraint ID •Takt Time Analysis•Cause and Effect Analysis •Brainstorming•5 Whys•Affinity Diagram•Pareto •Cause and Effect Matrix •FMEA•Hypothesis Tests•ANOVA•Chi Square •Simple and Multiple Regression

ACTIVITIES

• Identify Potential Root Causes

• Reduce List of Potential Root Causes

• Confirm Root Cause to Output Relationship

• Estimate Impact of Root Causes on Key Outputs

• Prioritize Root Causes

• Complete Analyze Tollgate

1.Validate the

Problem

4. Determine Root

Cause

3. Set Improvement

Targets

5. Develop Counter-

Measures

6. See Counter-MeasuresThrough

2. IdentifyPerformance

Gaps

7. Confirm Results

& Process

8. StandardizeSuccessfulProcesses

Define Measure Analyze ControlImprove

8-STEP PROCESS

Page 3: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

3Multiple Regression

Learning Objectives

Understand how to identify correlation with multiple variables

Learn how to create a mathematical model for the effect of multiple inputs on an output variable

Understand and identify multicollinearity

Understand how to use best subsets to identify the best model

Examine unusual observations to learn more about the data

Page 4: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

4Multiple Regression

Multiple Regression

In Simple Linear Regression, we had:

Y = B0 + B1X

In Multiple Linear Regression, we have:

Y = B0 + B1X1 + B2X2 + B3X3

We’d like to identify which, if any, of the predictor variables are useful in predicting Y

YX1

X5

X4

X3

X2

Y = f(X)

Page 5: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

5Multiple Regression

When Should I Use Multiple Regression?

The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but may also be used with count or categorical

inputs and outputs.

Continuous AttributeA

ttri

bu

te C

on

tin

uo

us

Independent Variable (X)D

ep

en

de

nt

Va

ria

ble

(Y

)

Regression ANOVA

Logistic

Regression

Chi-Square (2)

Test

Page 6: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

6Multiple Regression

Basic Steps for Regression Modeling

Process Flowchart

S I P O C

Scatter Plot, Histogram

Correlation, Test Hypothesis

Regression Analysis

STEPS OBJECTIVES KEY QUESTIONS

To identify KPIVs and

KPOVs

Which KPIVs will significantly

improve which KPOVs?

To visualize the data Does it look like there is

C&E relationship?

To qualify the C&E relationship

(Strength, % Variability, P-value)

To quantify the C&E relationship

(Method of Least Squares)

How strong is the C&E

relationship?

What is the prediction

equation?

Residual AnalysisTo validate the model selected

Is there anything suspicious

with the model selected?

1

5

4

3

2

KPIV = Key Process Input Variables KPOV = Key Process Output Variables

Page 7: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

7Multiple Regression

Example: Production Plant

A chemical engineer is investigating the amount of silver required in the high volume production of contact switches for a new Army radio. Although only a small amount of silver is deposited on the switches, a larger amount is wasted through a multiple step process. She has collected data and would like to develop a prediction model. A-06 Production Plant

Step 1: The variables identified as KPIVs are given below:

X1 = Average temperature of rinse bath (degrees C)

X2 = Speed of reel that feeds the switches through the line (inches/min)

X3 = Thickness of silver deposit (angstroms)

X4 = Water consumed (gallons per day)

Y = Amount of silver consumed (pounds/day)

Source: Applied Regression Analysis, Draper and Smith

What questions would you ask about this data?

Page 8: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

8Multiple Regression

Visualize the Data!

Step 2:

Visualize the Data

Data file: A-06 Production Plant.mtw

Select Graph>Matrix Plot

Page 9: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

9Multiple Regression

Looking for relationships between variables...

Step 2: Visualize the Data!

This dialog box comes upfirst

Select Matrix of Plots – Simple

Since we have only one (Y)variable and no groups

Click on OK to go the nextDialog box

Page 10: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

10Multiple Regression

Double click on all of the variables you want to include in the Matrix, to place them in the Graph variables box

Select Matrix Options to moveon to the next dialog box

Step 2: Visualize the Data!

Page 11: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

11Multiple Regression

Select Lower left to place allthe graph labels to thelower left of the boxes

Click on OK here and on the previous dialog box to getthe matrix

Step 2: Visualize the Data!

Page 12: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

12Multiple Regression

Correlation TableThere appear to be some relationships between certain variables and the response.

Temp

Thickness

Water

Amt of Ag

Speed

12

10

814.0

13.5

13.0

170

160

150

656055

21

20

19

12108 14.013.513.0 170160150

Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag

ResponseVariable

(Y)

Is this

good or

bad?

Page 13: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

13Multiple Regression

Quantify the Relationships Between Variables

Select Stat>Basic Statistics> Correlation

Step 3: Quantify the relationship

Page 14: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

14Multiple Regression

Double click on all of the variables you want to include, to place them in the Variables box

Check to display p-values(default setting)

Click on OK to get the Correlation Matrix in yourSession Window

Evaluating coefficients of correlation among predictors...

Correlation Matrix

Page 15: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

15Multiple Regression

Predictor variable pairwise correlations larger than .5-.7 are signs of trouble ... Multicollinearity. We will explain more shortly.

The TOP number in each pair is the

PearsonCoefficient of Correlation,

(r-Value)

While the BOTTOMnumber is the

p-Value

Correlation Matrix

Page 16: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

16Multiple Regression

Finding the Regression Equation...

Select: Stat>Regression>Regression

Step 4: Develop a prediction model

Page 17: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

17Multiple Regression

Double click on C5 Amt of AGand place it in the Response:variable box, then double click on all the variables you want to place in the Predictors:box.

Select Options to go to next dialog box.

Finding the Regression Equation... (Cont.)

Page 18: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

18Multiple Regression

In this dialog box, the onlything you have to do is checkVariance inflation factors

Click on OK here and on previous dialog box to get theregression analysis in yourSession Window

Finding the Regression Equation... (Cont.)

Page 19: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

19Multiple Regression

Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness

+ 0.0449 Water

Predictor Coef SE Coef T P VIF

Constant 5.72 10.83 0.53 0.607

Temp -0.01558 0.02616 -0.60 0.563 1.276

Speed 0.2393 0.2644 0.90 0.383 10.997

Thickness 0.443 1.033 0.43 0.675 11.671

Water 0.04495 0.01481 3.04 0.010 1.731

S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% The P-values indicate whether a particular

predictor is significant in presence of other

predictors in the model

Minitab displays the following regression equation:

Regression Equation

This new model explains 80.9% of response variability

R-Sq (adj) adjusts for degrees of freedom due to variables that have no real value. It

should be used when comparing models

Page 20: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

20Multiple Regression

Interpreting P-values

The P columns give the significance level for each term in the model

Typically, if a P value is less than or equal to 0.05, the variable is considered significant (i.e., null hypothesis is rejected)

If a P value is greater than 0.10, the term is removed from the model. A practitioner might leave the term in the model, if the P value is within the gray region between these two probability levels

Page 21: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

21Multiple Regression

Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness

+ 0.0449 Water

Predictor Coef SE Coef T P VIF

Constant 5.72 10.83 0.53 0.607

Temp -0.01558 0.02616 -0.60 0.563 1.276

Speed 0.2393 0.2644 0.90 0.383 10.997

Thickness 0.443 1.033 0.43 0.675 11.671

Water 0.04495 0.01481 3.04 0.010 1.731

S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5%

Regression output in Minitab’s Session Window

Regression Equation

High VIF values are signs of trouble (VIF > 10)

Variance Inflation Factor

Page 22: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

22Multiple Regression

Problems with Several Predictor Variables

Sometimes the Xs are correlated (dependent). This condition is known as Multicollinearity

Multicollinearity can cause problems (sometimes severe)

Estimates of the coefficients are affected (unstable, inflated variances)

Difficulty isolating the effects of each X

Coefficients depend on which Xs are included in the model

High multicollinearity inflates the standard error estimates, which increases the P values

If case of extreme multicollinearity, Minitab will throw out one term and give you notice

Page 23: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

23Multiple Regression

Graphical Representation of Multicollinearity

Total Variation

in Y

Variation Explained by

X1

Variation Explained by

X2

• Overlap represents correlation

• X1 and X2 are both correlated with Y

• X1 and X2 are highly correlated

• If X1 is in the model, we don’t need X2, and vice versa

Page 24: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

24Multiple Regression

VIF

Temp 1.276

Speed 10.997

Thickness 11.671

Water 1.731

Assessing the Degree of Multicollinearity

We use a metric called Variance Inflation Factor (VIF):

Where:

Ri2 is the R2 value you get when you regress Xi against the other X’s

A large Ri2 suggests that a variable is redundant

Rule of Thumb:

Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)

0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)

For the Production Plant data, Minitab gives us:

21

1

iRVIF

Select

Stat>Regression>Regression>Options>Display variance inflation factors

Two VIF’s are a bit large, but in this case with a R-sq. of 80.9%, some multicollinearity can be tolerated

Page 25: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

25Multiple Regression

Some Cautions About the Coefficients

Remember the prediction equation obtained earlier:

Relative importance of predictors cannot be determined from the size of their coefficients:

The coefficients are scale dependent

The coefficients are influenced by correlation among the predictor variables

If a high degree of multicollinearity exists, even the signs of the coefficients may be misleading

Water0.0449 Thickness 0.44 Speed 0.239 Temp. 0156.07.5 Ag ofAmt

Page 26: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

26Multiple Regression

Residual Analysis

Select Stat> Regression> Regression

Step 5: Validate the selected model

Is there anything suspicious with

this model?

Page 27: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

27Multiple Regression

Double click on C5 Amt of AGand place it in the Responsevariable box, then double click on all the variables you want to place in the Predictorsbox

Select Graphs to go to next dialog box

Residual Analysis (Cont.)

Page 28: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

28Multiple Regression

Select Four in one to get all fourResidual plots on one graph, or you can pick and choose the plotsYou want

Click on OK here and on previousDialog box to get Residual plots

Residual Analysis (Cont.)

Page 29: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

29Multiple Regression

1.00.50.0-0.5-1.0

99

90

50

10

1

Residual

Pe

rce

nt

N 17

AD 0.249

P-Value 0.705

21.521.020.520.019.5

0.50

0.25

0.00

-0.25

-0.50

Fitted Value

Re

sid

ua

l

0.60.40.20.0-0.2-0.4-0.6

4

3

2

1

0

Residual

Fre

qu

en

cy

161412108642

0.50

0.25

0.00

-0.25

-0.50

Observation Order

Re

sid

ua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Amt of Ag

Not too bad overall…

If you want to see the value for any observation, just hold your cursor over that point

Residual Analysis (Cont.)

Page 30: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

30Multiple Regression

How to Address Multicollinearity

Eliminate one or more input variables

We’ll look at a technique called Best Subsets Regression

Collect additional data

Use process knowledge to determine the principal relationship

Use DOE to further assess the multicollinearity

If neither are significant then eliminate both from the analysis

Page 31: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

31Multiple Regression

Best Subsets Regression

Rather than relying on the p-values alone, the computer looks at all possible combinations of variables and prints the resulting model characteristics

Statistics like adjusted R-Sq and MSError will improve as important model terms are added, then worsen as “junk” terms are added to the model

Page 32: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

32Multiple Regression

Best Subsets Regression Considerations

Objective: We want to select a model with predictive accuracy and minimum multicollinearity

Seek compromise between:

Overfitting (including model terms with only marginal, or no, contribution)

Underfitting (ignoring or deleting relatively important model terms)

What are some problems with overfitting?

What are some problems with underfitting?overfit underfit

Page 33: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

33Multiple Regression

Best Subsets Regression

Evaluating Candidate Models

Four things to look at when evaluating candidate models:

1. R2 (large R2 is desired, although R2 increases as we add more predictors to the model, so this should only be used for comparing models with the same number of terms)

2. Adjusted R2 (large is desired)

3. Mallows Cp statistic (small Cp desired, close to the number of terms in the model)

4. s (the estimate of the standard deviation around the regression)

Generally, the best three models are selected and checked for

significance of all factors and residual assumptions

Page 34: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

34Multiple Regression

More on the Mallows C-p Statistic

In practice, the minimum number of parameters needed in the model is when the Mallows’ C-p statistic is a minimum

Rule of Thumb:

We want C-p number of input variables

Page 35: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

35Multiple Regression

Best Subsets Regression

Select Stat> Regression>Best Subsets

Minitab data set: Production Plant

Page 36: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

36Multiple Regression

Best Subsets Regression (Cont.)

Enter Response variable

Enter Predictor variables(Input Variables)

Click on OK to get analysisin Session Window

Page 37: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

37Multiple Regression

Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

Response is Amt of Ag

T

h

i

c

S k W

T p n a

e e e t

Mallows m e s e

Vars R-Sq R-Sq(adj) Cp S p d s r

1 64.4 62.0 9.4 0.50387 X

1 62.3 59.8 10.7 0.51836 X

2 80.0 77.2 1.5 0.39047 X X

2 78.8 75.8 2.3 0.40200 X X

3 80.6 76.1 3.2 0.39959 X X X

3 80.3 75.8 3.4 0.40237 X X X

4 80.9 74.5 5.0 0.41275 X X X X

What Model(s) are the best candidates?

Best Subsets Regression (Cont.)

Page 38: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

38Multiple Regression

Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

Response is Amt of Ag

T

h

i

c

S k W

T p n a

e e e t

Mallows m e s e

Vars R-Sq R-Sq(adj) Cp S p d s r

1 64.4 62.0 9.4 0.50387 X

1 62.3 59.8 10.7 0.51836 X

2 80.0 77.2 1.5 0.39047 X X

2 78.8 75.8 2.3 0.40200 X X

3 80.6 76.1 3.2 0.39959 X X X

3 80.3 75.8 3.4 0.40237 X X X

4 80.9 74.5 5.0 0.41275 X X X X

R-Sq: Look for the highest value when comparing models with the same number of input variables

Best Subsets Regression (Cont.)

Page 39: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

39Multiple Regression

Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

Response is Amt of Ag

T

h

i

c

S k W

T p n a

e e e t

Mallows m e s e

Vars R-Sq R-Sq(adj) Cp S p d s r

1 64.4 62.0 9.4 0.50387 X

1 62.3 59.8 10.7 0.51836 X

2 80.0 77.2 1.5 0.39047 X X

2 78.8 75.8 2.3 0.40200 X X

3 80.6 76.1 3.2 0.39959 X X X

3 80.3 75.8 3.4 0.40237 X X X

4 80.9 74.5 5.0 0.41275 X X X X

R-Sq (adj): Look for the highest value when comparing models with different number

of input variables

Best Subsets Regression (Cont.)

Page 40: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

40Multiple Regression

Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

Response is Amt of Ag

T

h

i

c

S k W

T p n a

e e e t

Mallows m e s e

Vars R-Sq R-Sq(adj) Cp S p d s r

1 64.4 62.0 9.4 0.50387 X

1 62.3 59.8 10.7 0.51836 X

2 80.0 77.2 1.5 0.39047 X X

2 78.8 75.8 2.3 0.40200 X X

3 80.6 76.1 3.2 0.39959 X X X

3 80.3 75.8 3.4 0.40237 X X X

4 80.9 74.5 5.0 0.41275 X X X X

Cp: Look for models where Cp is small and close to the number of

input variables in the model

Best Subsets Regression (Cont.)

Page 41: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

41Multiple Regression

Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

Response is Amt of Ag

T

h

i

c

S k W

T p n a

e e e t

Mallows m e s e

Vars R-Sq R-Sq(adj) Cp S p d s r

1 64.4 62.0 9.4 0.50387 X

1 62.3 59.8 10.7 0.51836 X

2 80.0 77.2 1.5 0.39047 X X

2 78.8 75.8 2.3 0.40200 X X

3 80.6 76.1 3.2 0.39959 X X X

3 80.3 75.8 3.4 0.40237 X X X

4 80.9 74.5 5.0 0.41275 X X X X

S: We want S, the estimate of the standard deviation about the regression, to be as small

as possible

Best Subsets Regression (Cont.)

Page 42: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

42Multiple Regression

Once the Candidate Models Are Identified

Evaluate the candidate models under a “microscope”

Outliers

High leverage

Influential observations

Residuals

Prediction quality

Once a model has been selected, find the new regression equation

Test its predictive capability for observations NOT originally used in the modeling

Page 43: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

43Multiple Regression

Regression with Reduced Model

We select the best model with two variables, Speed & Water, and run Minitab again to obtain the new regression equation:

Select Stat>Regression>Regression

Page 44: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

44Multiple Regression

Regression with Reduced Model (Cont.)

Enter Amt of Ag as theResponse

Enter only Speed and Wateras Predictors

Click on OK to get analysisin Session Window

Page 45: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

45Multiple Regression

Regression with Reduced Model (Cont.)

Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water

Predictor Coef SE Coef T P

Constant 9.919 1.694 5.86 0.000

Speed 0.35689 0.08544 4.18 0.001

Water 0.04253 0.01206 3.53 0.003

S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%

Session window of Minitab yields the following regression equation for the reduced model:

Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed

+ 0.44 Thickness + 0.0449 Water

Predictor Coef SE Coef T P

Constant 5.72 10.83 0.53 0.607

H20 Temp -0.01558 0.02616 -0.60 0.563

Speed 0.2393 0.2644 0.90 0.383

Thick. 0.443 1.033 0.43 0.675

Water 0.04495 0.01481 3.04 0.010

S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5%

…to compare with the previous model:

Page 46: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

46Multiple Regression

Session window of Minitab also gives us the following output:

Unusual Observations

Obs Speed Amt of A Fit SE Fit Residual St Resid

3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R

R denotes an observation with a large standardized residual

An unusual observation means a large standard residual

Let’s see what would happen if we eliminated such an observation

from our collected data!

Unusual Observations

Page 47: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

47Multiple Regression

Without the Unusual Observation, the Session window of Minitab yields the following regression equation:

Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water

Predictor Coef SE Coef T P

Constant 8.610 1.567 5.49 0.000

Speed 0.23698 0.08960 2.64 0.020

Water 0.05775 0.01226 4.71 0.000

S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7%

…to compare with the regression equation of our previous reduced model

Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water

Predictor Coef SE Coef T P

Constant 9.919 1.694 5.86 0.000

Speed 0.35689 0.08544 4.18 0.001

Water 0.04253 0.01206 3.53 0.003

S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%

R-Sq goes up a little because we’ve gotten rid of “noise” in the model

Impact of the Unusual Observation

Page 48: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

48Multiple Regression

Takeaways

Regression analysis can be used with historical data as well data from designed experiments to build prediction models

Care must be exercised when using historical data

Correlation does not imply a cause and effect relationship

There may be serious problems with multicollinearity and high leverage observations

There are several diagnostic tools available to evaluate regression models:

Fit: R2, adjusted R2, Cp, S

Unusual observations: residual plots, leverage, CooksD

Multicollinearity: VIFs (Variance Inflation Factors)

Page 49: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

49Multiple Regression

Considerations in Regression

Set goals before doing the analysis (what do you want to learn, how well do you need to predict, etc.).

Gather enough observations to adequately measure error and check the model assumptions.

Make sure that the sample of data is representative of the population.

Excessive measurement error of the inputs (Xs) creates uncertainty in the estimated coefficients, predictions, etc.

Be sure to collect data on all potentially important explanatory variables.

Page 50: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

50Multiple Regression

Regression Checklist

Scatterplots (Y vs. X)

Histograms and/or Boxplots of Ys and Xs

Coefficients

Significance (p < .05 - .10)

R2 and adjusted R2

S

Residuals (no obvious pattern)

Unusual Y values (standardized residuals > 2)

Unusual X values (leverage > 2p/n)

Overfitting vs. underfitting (C-p number of input variables in model)

Multicollinearity (VIF > 5-10)

Page 51: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

What other comments or questions

do you have?

Page 52: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

52Multiple Regression

References

Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989

Draper and Smith, Applied Regression Analysis, Wiley, 1981

Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.

Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980

Myers, Raymond H., Classical and Modern Regression with Applications, Duxbury, 1990

Dielman, Applied Regression Analysis for Business and Economics, Duxbury, 1991

Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989

Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press

Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ Press

Page 53: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

National GuardBlack Belt Training

APPENDIX

Additional Exercises Anthony’s Pizza

Customer Satisfaction

A Study of Supervisor Performance

Page 54: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

54Multiple Regression

Additional Practice Example:

Anthony’s Pizza

We have received Voice of the Customer feedback telling us that customers are dissatisfied if we cannot accurately predict the time of their pizza delivery when it is beyond the 30 minute target

We would like to develop a model so that when the customer calls, we can accurately predict delivery time

Page 55: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

55Multiple Regression

Additional Practice Example:

Six Sigma Pizza

Our Minitab data can be found in the file Multiple Regression - Pizza.mpj

Based on the data that we have collected, we are going to study the effects of total pizzas ordered, defects, and incorrect order on delivery time

Page 56: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

56Multiple Regression

Additional Practice Exercise:

Customer Satisfaction

Bob Black Belt would like to get a better understanding of the customer satisfaction data

Use the data provided in the Minitab file A-06 Customer Satisfaction Data.mtw to create a Regression Model to predict Overall Satisfaction

Each row of data is a monthly average of how customers rated the services on a scale of 1-10. For example, in January, the average

of customer ratings for Staff Responsiveness was a 7.9.

Page 57: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

57Multiple Regression

Additional Practice Exercise:

Customer Satisfaction (Cont.)

Consider Staff Responsiveness, Check-out Speed, Frequent Guest Program, and Problems Resolved as possible inputs that could be used to predict Overall Satisfaction.

First, study correlation with a Matrix Plot and Correlation Table

Next, create the initial Regression Model

Find the best combination of inputs with Best Subsets

Finally, run the reduced Regression Model

Page 58: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

58Multiple Regression

Additional Practice Exercise:

A Study of Supervisor Performance

A recent survey of clerical employees in a large financial organization included questions related to employee satisfaction with their supervisors. The company was interested in any relationships between specific supervisor characteristics and overall satisfaction with supervisors as perceived by the employees,

Y = Overall rating of the job being done by the supervisor

X1 = Handles employee complaints

X2 = Does not allow special privileges

X3 = Provides opportunity to learn new things

X4 = Raises based on performance

X5 = Too critical of poor performance

X6 = Rate of advancing to better jobs (employee’s perception of their own advancement rate)

Source: Regression Analysis by Example, Chatterjee and Price

Page 59: NG BB 37 Multiple Regression

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO

59Multiple Regression

Additional Practice Exercise:

A Study of Supervisor Performance

The survey responses were on a scale of 1-5

For purposes of analysis, a score of 1 or 2 was considered “favorable”, while a score of 3, 4, or 5 was considered “unfavorable”

Data was collected from 30 departments, selected randomly form the organization. Each department had approximately 35 employees with one supervisor

For each department, the data was aggregated and the data recorded was the percent favorable for each item

Data file is A-06 Attitude.mtw

Questions:Can we predict the overall supervisor rating using this data?

What variable(s) have the strongest correlation with the supervisor rating?

Are there any unusual observations?

Comments on the data?