Transcript
• 1. Simple Linear Regression ModelsHongwei ZhangPerformance Evaluation:Hongwei Zhanghttp://www.cs.wayne.edu/~hzhangAcknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.Statistics is the art of lying by means of figures.--- Dr. Wilhelm Stekhel
• 2. Simple linear regression modelsResponse Variable: Estimated variablePredictor Variables: Variables used to predict the responseAlso called predictors or factorsRegression Model: Predict a response for a given set ofpredictor variablesLinear Regression Models: Response is a linear function ofpredictorsSimple Linear Regression Models: Only one predictor
• 3. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 4. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 5. Definition of a good model?
• 6. Good models (contd.)Regression models attempt to minimize the distance measuredvertically between the observation point and the model line (orcurve)The length of the line segment is called residual, modeling error, orsimply errorThe negative and positive errors should cancel out => Zero overallerrorMany lines will satisfy this criterionChoose the line that minimizes the sum of squares of the errors
• 7. Good models (contd.)Formally,where, is the predicted response when the predictor variable isx. The parameter b and b are fixed regression parameters tox. The parameter b0 and b1 are fixed regression parameters tobe determined from the data.Given n observation pairs {(x1, y1), , (xn, yn)}, the estimatedresponse for the i-th observation is:The error is:
• 8. Good models (contd.)The best linear model minimizes the sum of squared errors(SSE):subject to the constraint that the overall mean error is zero:subject to the constraint that the overall mean error is zero:This is equivalent to the unconstrained minimization of thevariance of errors (Exercise 14.1)
• 9. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 10. Estimation of model parametersRegression parameters that give minimum errorvariance are:where,
• 11. Example 14.1
• 12. Example (contd.)
• 13. Example (contd.)
• 14. Derivation of regression parameters?
• 15. Derivation (contd.)
• 16. Derivation (contd.)
• 17. Least Squares Regression vs. Least AbsoluteDeviations Regression?Least Squares Regression Least Absolute DeviationsRegressionNot very robust to outliers Robust to outliersSimple analytical solution No analytical solving method(have to use iterative computation-intensive method)Stable solution Unstable solutionAlways one unique solution Possibly multiple solutionsThe unstable property of the method of least absolute deviations means that, for any smallhorizontal adjustment of a data point, the regression line may jump a large amount. Incontrast, the least squares solutions is stable in that, for any small horizontal adjustment ofa data point, the regression line will always move only slightly, or continuously.
• 18. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 19. Allocation of variation
• 20. Allocation of variation (contd.)The sum of squared errors without regression would be:This is called total sum of squares or (SST). It is a measure ofThis is called total sum of squares or (SST). It is a measure ofys variability and is called variation of y. SST can be computedas follows:Where, SSY is the sum of squares of y (or y2). SS0 is the sumof squares of and is equal to
• 21. Allocation of variation (contd.)Variation not explainedby the regression
• 22. Allocation of variation (contd.)
• 23. ExampleFor the disk I/O-CPU time data of Example 14.1:The regression explains 97% of CPU times variation.
• 24. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 25. Standard deviation of errorsSince errors are obtained after calculating two regression parametersfrom the data, errors have n-2 degrees of freedomSSE/(n-2) is called mean squared errors or (MSE)Standard deviation of errors = square root of MSENote:SSY has n degrees of freedom since it is obtained from n independentobservations without estimating any parametersSS0 has just one degree of freedom since it can be computed simplyfromSST has n-1 degrees of freedom, since one parameter must becalculated from the data before SST can be computed
• 26. Standard deviation of errors (contd.)SSR, which is the difference between SST and SSE, has theremaining one degree of freedom.Overall,Notice that the degrees of freedom add just the waythe sums of squares do
• 27. Example
• 28. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 29. CIs for regression parametersRegression coefficients b0 and b1 are estimates from a singlesample of size n => 1) Random; 2) Using another sample, theestimates may be different.If and are true parameters of the population (i.e., y = +If 0 and 1 are true parameters of the population (i.e., y = 0 +1x), then the computed coefficients b0 and b1 are estimates of 0and 1, respectively.Sample standard deviation of b0 and b1
• 30. CIs for regression parameters (contd.)The 100(1-)% confidence intervals for b0 and b1 can becomputed using t[1-/2; n-2] --- the 1-/2 quantile of a tvariate with n-2 degrees of freedom. The confidence intervalsare:If a confidence interval includes zero, then the regressionparameter cannot be considered different from zero at the100(1-)% confidence level
• 31. Example
• 32. Example (contd.)The 0.95-quantile of a t-variate with 5 degrees of freedom is 2.015=> 90% confidence interval for b0 is:Since, the confidence interval includes zero, the hypothesis that thisparameter is zero cannot be rejected at 0.10 significance level => b0 isessentially zero.=> 90% Confidence Interval for b1 is:Since the confidence interval does not include zero, the slope b1 issignificantly different from zero at this confidence level.
• 33. Case study 14.1: remote procedure call
• 34. Case study (contd.)
• 35. Case study (contd.)
• 36. Case study (contd.)Best linear models are:The regressions explain 81% and 75% of theThe regressions explain 81% and 75% of thevariation, respectively.Does ARGUS takes larger time per byte as well as alarger set up time per call than UNIX?
• 37. Case study (contd.)?Intervals for intercepts overlap while those of the slopes do not. => Set up times arenot significantly different in the two systems while the per byte times (slopes) aredifferent.
• 38. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 39. CI for predications
• 40. CI for predications (contd.)
• 41. CI for predications (contd.)Standard deviation of the prediction is minimal at thecenter of the measured range (i.e., when x = x);Goodness of the prediction decreases as we moveaway from the center.away from the center.
• 42. Example
• 43. Example (contd.)
• 44. Example (contd.)
• 45. OutlineDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression ParametersConfidence Intervals for PredictionsVisual Tests for verifying Regression Assumption
• 46. Visual test for regress assumptionsRegression assumptions:The true relationship between the response variable y andthe predictor variable x is linear.The predictor variable x is non-stochastic and it is measuredThe predictor variable x is non-stochastic and it is measuredwithout any error.The model errors are statistically independent.The errors are normally distributed with zero mean and aconstant standard deviation.
• 47. Visual test for linear relationship
• 48. Visual test for independent errorsAny trend would imply the dependence of errors on predictorvariable => curvilinear model or transformationIn practice, dependence can be proven yet independent cannot
• 49. Visual test for independent errors (contd.)Any trend would imply that other factors (such as environmentalconditions or side effects) should be considered in the modeling
• 50. Visual test for normal distribution oferrors?
• 51. Visual test for constant standard deviationof errors
• 52. Example
• 53. Another example: RPC performance
• 54. SummaryDefinition of a Good ModelEstimation of Model parametersAllocation of VariationAllocation of VariationStandard deviation of ErrorsConfidence Intervals for Regression Parameters & PredictionsVisual Tests for verifying Regression Assumption
• 55. Homework#51. (100 points)