chapter 12: linear regression 1. introduction regression analysis and analysis of variance are the...

30
Chapter 12: Linear Regression 1

Upload: david-wells

Post on 16-Jan-2016

249 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

Chapter 12:

Linear Regression

1

Page 2: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

Introduction

• Regression analysis and Analysis of variance are the two most widely used statistical procedures.

• Regression analysis:– Description– Prediction– Estimation

2

Page 3: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.1 Simple Linear Regression

• In (univariate) regression, there is always a single “dependent” variable, and one or more “independent” variables. – Number of non-conforming units is dependent on the amount

of time devoted to maintain control charts• Simple is used to denote the fact that a single

independent variable is being used.• Linear is referred to the parameters, not independent

variables.

3

(12.1)

(12.2)

Page 4: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.1 Simple Linear Regression

• is the general form of the equation for a straight line.• indicates that there is not an exact relationship between X

and Y.• Regression analysis is not used for variables that have an

exact linear relationship. • and are generally unknown and must be estimated.• The is generally thought as an error term.• Let Y denotes the number of non-conforming units

produced in each month, and X represents the amount of time devoted to use QC charts each month.

4

Page 5: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

Table 12.1 Quality Improvement Data

5

Month Time Devoted to Quality Impr.

# of Non-conforming

January 56 20February 58 19March 55 20April 62 16May 63 15June 68 14July 66 15August 68 13September 70 10October 67 13November 72 9December 64 8

Page 6: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

Figure 12.1 Scatter Plot

6

Page 7: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

Figure 12.1a Scatter Plot

7

Page 8: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.1 Simple Linear Regression

• Regression equation: a line through the center of the points minimizing the sum of the squares of the deviations from each point to the line. (Method of least squares)

• is to be minimized where

• Round-off error• Prediction equation

8

Page 9: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.1 Simple Linear Regression

The regression equation isY = 55.9 - 0.641 X

Predictor Coef SE Coef T PConstant 55.923 2.824 19.80 0.000X -0.64067 0.04332 -14.79 0.000

S = 0.888854 R-Sq = 95.6% R-Sq(adj) = 95.2%

Analysis of Variance

Source DF SS MS F PRegression 1 172.77 172.77 218.67 0.000Residual Error 10 7.90 0.79Total 11 180.67

9

Page 10: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.1 Simple Linear Regression

• Prediction equation: should only be used for values within the data range, or slightly outside the interval.

• Descriptive:– A decrease of 0.64 non-conforming units for every additional hour

devoted to quality improvement

10

Page 11: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.2 Worth of the Prediction Equation

11

Obs X Y Fit SE Fit Residual St Resid1 56.0 20.000 20.046 0.464 -0.046 -0.062 58.0 19.000 18.765 0.395 0.235 0.303 55.0 20.000 20.687 0.500 -0.687 -0.934 62.0 16.000 16.202 0.286 -0.202 -0.245 63.0 15.000 15.561 0.270 -0.561 -0.666 68.0 14.000 12.358 0.289 1.642 1.957 66.0 15.000 13.639 0.261 1.361 1.608 68.0 13.000 12.358 0.289 0.642 0.769 70.0 10.000 11.077 0.338 -1.077 -1.31

10 67.0 13.000 12.999 0.272 0.001 0.0011 72.0 9.000 9.795 0.400 -0.795 -1.0012 74.0 8.000 8.514 0.470 -0.514 -0.68

Page 12: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.2 Worth of the Prediction Equation

12

• Pure error: data points with the same X but different Y’s constitute pure error since regression line can’t be vertical.

• Measure of the worth of the prediction equation:

• Since , (

(

• If =0 (no relationship between X and Y), =0

(12.4)

Page 13: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.3 Assumptions

13

• The true relationship between X and Y can be adequately represented by the model

• The errors should be independent.• The errors are approximately normally distributed

Y  =  𝛽0  + 1X  +  (12.1)

Page 14: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.4 Checking Assumptions through Residual Plots

14

• The residuals should be plotted against– X or – Time– Any other variable

• Residual plots– All points close to the midline– Form a tight cluster that can be enclosed in a rectangle

• If there were residual outliers, investigate• If the error variance increases or decreases, this

problem can be remedied by a transformation of X.• If in the form of parabola, X2 term would probably

needed.

Page 15: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.4 Checking Assumptions through Residual Plots

15

Page 16: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.5 Confidence Intervals

16

• Assumption: Normality of the error terms– Robust regression– Non-parametric regression

• Confidence Interval for

• Confidence Interval for

Where

Page 17: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.5 Hypothesis Test

17

• Hypothesis Test for

Where and

Page 18: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.6 Prediction Interval for Y

18

Where and

Page 19: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.6 Prediction Interval for Y

19

Page 20: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.7 Regression Control Chart

20

• To monitor the dependent variable using a control chart approach

• The center line is

• Control Limits for

Where and

(12.5)

(12.6)

Page 21: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.8 Cause-Selecting Control Chart

21

• The general idea is to try to distinguish between quality problems that occur at one stage in a process from problems that occur at a previous processing step.

• Let Y be the output from the second step and let X denote the output from the first step. The relationship between X and Y would be modeled.

Page 22: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.9 Linear, Nonlinear, and Nonparametric Profiles

22

• Profile refers to the quality of a process or product being characterized by a (Linear, Nonlinear, or Nonparametric) relationship between a response variable and one or more explanatory variables.

• A possible way is to monitor each parameter in the model with a Shewhart chart.– The independent variables must be fixed– Control chart for R2

Page 23: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.10 Inverse Regression

23

• An important application of simple linear regression for quality improvement is in the area of calibration.

• Assume two measuring tools are available – One is quite accurate but expensive to use and the other is not as expensive but also not as accurate. If the measurements obtained from the two devices are highly correlated, then the measurement that would have been made using the expensive measuring device could be predicted fairly from the measurement using the less expensive device.

• Let Y = measurement from the less expensive deviceX = measurement from the accurate device

Page 24: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.10 Inverse Regression

24

Classical estimation approach• First, regress Y on X, to obtain • Solve for X, • For a known value of Y, , the equation is

Inverse regression (X is regressed on Y)

• if X and Y were perfectly correlated

Page 25: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.10 Inverse RegressionExample

25

Classical estimation approach• First, regress Y on X, to obtain

Inverse regression (X is regressed on Y)

• At Y X

Y X2.3 2.42.5 2.62.4 2.52.8 2.92.9 3.02.6 2.72.4 2.52.2 2.32.1 2.22.7 2.7

Page 26: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.11 Multiple Linear Regression

• In multiple regression, there is more than one “independent” variable.

26

Page 27: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.12 Issues in Multiple Regression12.12.1 Variable Selection

• R2 will virtually always increase when additional variables are added to a prediction equation.

• increases when new regressors are added• A commonly used statistic for determining the number

of parameters is the Cp

Where p is the number of parameters in the modelSSEp is the residual sum of squares is the error variance using all the available regressors

• The idea is to look hard at those prediction equations for which Cp is small and close to p.

27

Page 28: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.12.3 Multicollinear Data

• Problems occur when at least two of the regressors are related in some manner.

• Solutions:– Discard one or more variables causing the multicollinearity– Use ridge regression

28

Page 29: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.12.4 Residual Plots

• Residual plots are used extensively in multiple regression for checking on the model assumptions

• The residuals should generally be plotted against , each of the regressors, time, and any potential regressor.

29

Page 30: Chapter 12: Linear Regression 1. Introduction Regression analysis and Analysis of variance are the two most widely used statistical procedures. Regression

12.12.6 Transformations

• A regression model can often be improved by transforming one or more of the regressors, and possibly the dependent variable as well.

• Transformation can also often be used to transform a nonlinear regression model into a linear one.

• For example, can be transformed into a linear model

30