simple linear regression (slr) che1147 saed sayad university of toronto
Embed Size (px)
- Slide 1
Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto Slide 2 Types of Correlation Positive correlationNegative correlationNo correlation Slide 3 Simple linear regression describes the linear relationship between a predictor variable, plotted on the x-axis, and a response variable, plotted on the y-axis Independent Variable (X)dependent Variable (Y) Slide 4 X Y 1.0 Slide 5 X Y Slide 6 X Y Slide 7 X Y Slide 8 Fitting data to a linear model interceptslope residuals Slide 9 How to fit data to a linear model? The Ordinary Least Square Method (OLS) Slide 10 Least Squares Regression Residual () = Sum of squares of residuals = Model line: we must find values of and that minimise Slide 11 Regression Coefficients Slide 12 Required Statistics Slide 13 Descriptive Statistics Slide 14 Regression Statistics Slide 15 Y Variance to be explained by predictors (SST) Slide 16 Y X1X1 Variance NOT explained by X 1 (SSE) Variance explained by X 1 (SSR) Slide 17 Regression Statistics Slide 18 Coefficient of Determination to judge the adequacy of the regression model Slide 19 Regression Statistics Correlation measures the strength of the linear association between two variables. Slide 20 Standard Error for the regression model Regression Statistics Slide 21 ANOVA dfSSMSFP-value Regression1SSRSSR / dfMSR / MSEP(F) Residualn-2SSESSE / df Totaln-1SST If P(F)< then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y. ANOVA to test significance of regression Slide 22 Hypothesis Tests for Regression Coefficients Slide 23 Hypotheses Tests for Regression Coefficients Slide 24 Confidence Interval on Regression Coefficients Confidence Interval for Slide 25 Hypothesis Tests on Regression Coefficients Slide 26 Confidence Interval for the intercept Confidence Interval on Regression Coefficients Slide 27 Hypotheses Test the Correlation Coefficient We would reject the null hypothesis if Slide 28 Diagnostic Tests For Regressions Expected distribution of residuals for a linear model with normal distribution or residuals (errors). Slide 29 Diagnostic Tests For Regressions Residuals for a non-linear fit Slide 30 Diagnostic Tests For Regressions Residuals for a quadratic function or polynomial Slide 31 Diagnostic Tests For Regressions Residuals are not homogeneous (increasing in variance) Slide 32 Regression important points 1.Ensure that the range of values sampled for the predictor variable is large enough to capture the full range to responses by the response variable. Slide 33 X Y X Y Slide 34 Regression important points 2. Ensure that the distribution of predictor values is approximately uniform within the sampled range. Slide 35 X Y X Y Slide 36 Assumptions of Regression 1. The linear model correctly describes the functional relationship between X and Y. Slide 37 Assumptions of Regression 1. The linear model correctly describes the functional relationship between X and Y. Y X Slide 38 Assumptions of Regression 2. The X variable is measured without error X Y Slide 39 Assumptions of Regression 3. For any given value of X, the sampled Y values are independent 4. Residuals (errors) are normally distributed. 5. Variances are constant along the regression line. Slide 40 Multiple Linear Regression (MLR) Slide 41 The linear model with a single predictor variable X can easily be extended to two or more predictor variables. Slide 42 Y X1X1 Variance NOT explained by X 1 and X 2 Unique variance explained by X 1 Unique variance explained by X 2 X2X2 Common variance explained by X 1 and X 2 Slide 43 Y X1X1 X2X2 A good model Slide 44 Partial Regression Coefficients (slopes): Regression coefficient of X after controlling for (holding all other predictors constant) influence of other variables from both X and Y. Partial Regression Coefficients interceptresiduals Slide 45 The matrix algebra of Ordinary Least Square Predicted Values: Residuals: Intercept and Slopes: Slide 46 Regression Statistics How good is our model? Slide 47 Regression Statistics Coefficient of Determination to judge the adequacy of the regression model Slide 48 Adjusted R 2 are not biased! n = sample size k = number of independent variables Regression Statistics Slide 49 Standard Error for the regression model Regression Statistics Slide 50 ANOVA dfSSMSFP-value RegressionkSSRSSR / dfMSR / MSEP(F) Residualn-k-1SSESSE / df Totaln-1SST If P(F)< then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y. ANOVA to test significance of regression at least one! Slide 51 Hypothesis Tests for Regression Coefficients Slide 52 Hypotheses Tests for Regression Coefficients Slide 53 Confidence Interval on Regression Coefficients Confidence Interval for Slide 54 Slide 55 Slide 56 Slide 57 Slide 58 Diagnostic Tests For Regressions Expected distribution of residuals for a linear model with normal distribution or residuals (errors). Slide 59 Standardized Residuals Slide 60 Avoiding predictors (Xs) that do not contribute significantly to model prediction Model Selection Slide 61 - Forward selection The best predictor variables are entered, one by one. - Backward elimination The worst predictor variables are eliminated, one by one. Model Selection Slide 62 Forward Selection Slide 63 Backward Elimination Slide 64 Model Selection: The General Case Reject H 0 if : Slide 65 The degree of correlation between Xs. A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors. Multicolinearity Slide 66 Scatter Plot Slide 67 Multicolinearity If the F-test for significance of regression is significant, but tests on the individual regression coefficients are not, multicolinearity may be present. Variance Inflation Factors (VIFs) are very useful measures of multicolinearity. If any VIF exceed 5, multicolinearity is a problem. Slide 68 Model Evaluation Prediction Error Sum of Squares (leave-one-out) Slide 69 Thank You!