# basic statistics linear regression. x y simple linear regression

of 20/20
Basic Statistics Linear Regression

Post on 18-Jan-2016

227 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• Basic StatisticsLinear Regression

• XYSimple Linear Regression

• Predicting Y from XRecall when we looked at scatter plots in our discussion of correlation, we showed generally the estimate of Y given a value for X, when the correlation was not perfect.We will now look at how to use our knowledge of the correlation to predict a value for Y, when we know a value for X.

• Variable XVariable YThe GREEN line shows our prediction or regression line.highhighlowlowScatter Plot of Y and X Estimated Y value

• Prediction EquationThe green line in the previous slide showed us our prediction line.We will use the mathematical formula for a straight line as the method for predicting a value for Y when we know the value for X.The process is called Linear Regression because, in this class, we will only deal with relationships that can be fitted by a straight line.The general formula for a straight line is:

• The Prediction Equationay = the intercept or where the prediction line crosses the Y-axis (the value of Y when X = 0)by = the regression coefficient that indicates the amount of change in Y when the value of X increases one unit.

• A Simple ExampleSuppose that a club charges a flat \$25 to use their facilities.They also charge a \$10 fee per hour for using the tennis courts.Now, assume that you want to play tennis for 2 hours at this club. How much would you have to pay? = \$25 + (2) \$10 = \$25 + \$20 = \$45 for two hours of tennis

• Linking the Simple Example to Regression = \$25 + (2) \$10 = \$25 + \$20 = \$45 for two hours of tennisIn our example:\$25 is ay, the intercept. Even if we didnt play any tennis (X = 0), it would cost \$25 to use the club.\$10 is by, the regression coefficient (it costs \$10 for each hour of tennis played)In this case we predicted how much it would cost (Y) when we knew how long we wanted to play tennis.

• Formulae for Sums of SquaresThese were introduced in our discussion of correlation.

• Calculating the Regression Coefficient (b)or

• Calculating the Intercept (a)You will notice that you must calculate the regression coefficient (b) before you can calculate the intercept (a), since the calculation of a uses b.

• An ExampleFrom our earlier example, suppose that our college statistics professor is interested in predicting how many errors students might make on the mid-term examination based on how many hours they studied. Specifically, the professor wants to know how many errors a student might make if the student studied for 5 hours.

• The Stats Professors Data

StudentXYX2Y2XY1415162256024121614448359258145461036100605784964 566744916287764936428928141899481163610123100936TotalX = 70Y = 73 X2 =546Y2=695XY=429

• The Resulting Sum of Squares= 546 - 702/10 = 546 - 490 = 56= 695 - 732/10 = 695 - 523.9 = 162.1= 429 (70)(73)/10 = 429 511 = -82

StudentXYX2Y2XYTotalX = 70Y = 73 X2 =546Y2=695XY=429

• Calculating the Regression Coefficient (b)= - 82 / 56 = - 1.46 This can be interpreted as the change in the value of Y (in our case, errors made on the mid-term), for a unit change in X, or for us, each additional hour studied! Thus, study for another hour and make 1.46 fewer mistakes (on average!).

• Calculating the Intercept (a)= 7.3 (-1.46)(7) = 7.3 + 10.25 = 17.55Therefore, our prediction equation is = 17.55 + (-1.46) (X)

• Using Our Prediction Equation = 17.55 + (-1.46) (X)If the professor wanted to predict the number of errors a student might make if the student had studied for 5 hours, then we would substitute 5 for X in the above equation and obtain: = 17.55 + (-1.46) (5) = 17.55 + (-7.3) = 10.25 Thus, the professor would predict 10.25 errors for a student who had studied for 5 hours.

• Measuring Prediction Errors:The Standard Error of the EstimateORSince we know that the estimate is not exact, as statisticians, we must report how much error we feel is in our estimate. The formula is:

• Calculating the Standard Error of the Estimate= 1 - .74(162.1) / 8

= 2.29Thus, when we estimated 10.25 errors, we also would report that the Standard Error of the Estimate is 2.29.

• Summarizing Prediction EquationsThe existence of a relationship between two variables allows us to use that knowledge to make predictions.The prediction based on our equation will result in less error in prediction than using the mean of the dependent variable.Two sums of squares are required to calculate the regression coefficient and the intercept.