corelation and regression

Upload: jaydev-chakraborty

Post on 03-Jun-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Corelation and Regression

    1/33

    Elementary StatisticsLarson Farber

    9 Correlation and Regression

  • 8/12/2019 Corelation and Regression

    2/33

    orrelation

    Section 9.1

  • 8/12/2019 Corelation and Regression

    3/33

    Correlation

    What type of relationship exists between the twovariables and is the correlation significant?

    x y

    Cigarettes smoked per day

    Score on SATHeight

    Hours of Training

    Explanatory(Independent) Variable

    Response(Dependent) Variable

    A relationship between two variables

    Number of Accidents

    Shoe Size Height

    Lung Capacity

    Grade Point AverageIQ

  • 8/12/2019 Corelation and Regression

    4/33

    Negative Correlationasxincreases, ydecreases

    x= hours of trainingy= number of accidents

    Scatter Plots and Types of Correlation

    60

    50

    40

    30

    20

    10

    0

    0 2 4 6 8 10 12 14 16 18 20

    Hours of Training

    Accidents

  • 8/12/2019 Corelation and Regression

    5/33

    Positive Correlationasxincreases,y increases

    x= SAT scorey= GPA

    GPA

    Scatter Plots and Types of Correlation

    4.003.753.50

    3.002.752.502.252.00

    1.501.75

    3.25

    300 350 400 450 500 550 600 650 700 750 800

    Math SAT

  • 8/12/2019 Corelation and Regression

    6/33

    No linear correlation

    x= height y= IQ

    Scatter Plots and Types of Correlation

    160

    150

    140

    130120

    110

    100

    90

    8060 64 68 72 76 80

    Height

    IQ

  • 8/12/2019 Corelation and Regression

    7/33

    Correlation Coefficient

    A measure of the strength and direction of a linearrelationship between two variables

    The range of ris from 1 to 1.

    If ris close to1 there is a

    strongpositive

    correlation.

    If ris close to 1there is a strongnegativecorrelation.

    If ris close to0 there is nolinear

    correlation.

    1 0 1

  • 8/12/2019 Corelation and Regression

    8/33

    x y8 78

    2 925 90

    12 5815 43

    9 746 81

    AbsencesFinalGrade

    Application

    959085807570656055

    45

    40

    50

    0 2 4 6 8 10 12 14 16

    FinalGrad

    e

    X

    Absences

  • 8/12/2019 Corelation and Regression

    9/33

    608484648100

    3364184954766561

    624184450

    696645666486

    57 516 3751 579 39898

    1 8 782 2 923 5 90

    4 12 585 15 436 9 747 6 81

    644

    25

    1442258136

    xy x2

    y2

    Computation of r

    x y

  • 8/12/2019 Corelation and Regression

    10/33

    ris the correlation coefficient for the sample. Thecorrelation coefficient for the population is (rho).

    The sampling distribution for ris a t-distribution with n

    2 d.f.

    Standardized teststatistic

    For a two tail test for significance:

    For left tail and right tail to testnegative or positive significance:

    Hypothesis Test for Significance

    (The correlation is not significant)

    (The correlation is significant)

  • 8/12/2019 Corelation and Regression

    11/33

    A t-distribution with 5 degrees of freedom

    Test of Significance

    You found the correlation between the number of times absentand a final grade r= 0.975. There were seven pairs of

    data.Test the significance of this correlation. Use = 0.01.

    1. Write the null and alternative hypothesis.

    2. State the level of significance.

    3. Identify the sampling distribution.

    (The correlation is not significant)

    (The correlation is significant)

    = 0.01

  • 8/12/2019 Corelation and Regression

    12/33

    t0 4.0324.032

    Rejection Regions

    Critical Values t0

    4. Find the critical value.

    5. Find the rejection region.

    6. Find the test statistic.

  • 8/12/2019 Corelation and Regression

    13/33

    t0

    4.032 4.032

    t= 9.811 falls in the rejection region. Reject the null hypothesis.

    Thereisa significant correlation between the number oftimes absent and final grades.

    7. Make your decision.

    8. Interpret your decision.

  • 8/12/2019 Corelation and Regression

    14/33

    Linear Regression

    Section 9.2

  • 8/12/2019 Corelation and Regression

    15/33

    The equation of a line may be written as y= mx+ bwhere mis the slope of the line and bis the y-intercept.

    The line of regression is:

    The slope mis:

    The y-intercept is:

    Once you know there is a significant linear correlation,you can write an equation describing the relationshipbetween thexandyvariables. This equation is called theline of regressionor least squares line.

    The Line of Regression

  • 8/12/2019 Corelation and Regression

    16/33

    180

    190

    200

    210

    220

    230240

    250

    260

    1.5 2.0 2.5 3.0Ad $

    = a residual

    (xi,yi) = a data point

    revenue

    = a point on the line with the same x-value

  • 8/12/2019 Corelation and Regression

    17/33

    Calculate mand b.

    Write the equation of theline of regression with

    x= number of absencesand y= final grade.

    The line of regression is: =

    3.924x+ 105.667

    60848464

    81003364184954766561

    624184

    450696645666486

    57 516 3751 579 39898

    1 8 782 2 92

    3 5 904 12 585 15 436 9 747 6 81

    644

    251442258136

    xy x2 y2x y

  • 8/12/2019 Corelation and Regression

    18/33

    0 2 4 6 8 10 12 14 16

    4045

    50556065707580859095

    Absences

    FinalGra

    de

    m= 3.924 and b= 105.667

    The line of regression is:

    Note that the point = (8.143, 73.714) is on the line.

    The Line of Regression

  • 8/12/2019 Corelation and Regression

    19/33

    The regression line can be used to predict values of yforvalues ofxfalling within the range of the data.

    The regression equation for number of times absent and final grade is:

    Use this equation to predict the expected grade for a student with

    (a) 3 absences (b) 12 absences

    (a)

    (b)

    Predicting yValues

    =

    3.924(3) + 105.667 = 93.895

    = 3.924(12) + 105.667 = 58.579

    = 3.924x+ 105.667

  • 8/12/2019 Corelation and Regression

    20/33

    Measures ofRegression andorrelation

    Section 9.3

  • 8/12/2019 Corelation and Regression

    21/33

    The coefficient of determination, r2,is the ratio of explainedvariation in yto the total variation in y.

    The correlation coefficient of number of times absent andfinal grade is r= 0.975. The coefficient of determinationis r2 = (0.975)2 = 0.9506.

    Interpretation: About 95% of the variation in final gradescan be explained by the number of times a student isabsent. The other 5% is unexplained and can be due tosampling error or other variables such as intelligence,

    amount of time studied, etc.

    The Coefficient of Determination

  • 8/12/2019 Corelation and Regression

    22/33

    The Standard Error of Estimate,se,is the standard

    deviation of the observed yivalues about the predicted

    value.

    The Standard Error of Estimate

  • 8/12/2019 Corelation and Regression

    23/33

    1 8 78 74.275 13.87562 2 92 97.819 33.86083 5 90 86.047 15.6262

    4 12 58 58.579 0.33525 15 43 46.807 14.49326 9 74 70.351 13.31527 6 81 82.123 1.2611

    92.767

    = 4.307

    x y

    Calculate for eachx.

    The Standard Error of Estimate

  • 8/12/2019 Corelation and Regression

    24/33

    Given a specific linear regression equation andx0, a specific valueofx, a c-prediction interval for yis:

    where

    Use a t-distribution with n 2 degrees of freedom.

    The point estimate is andEis the maximum error of estimate.

    Prediction Intervals

  • 8/12/2019 Corelation and Regression

    25/33

    Construct a 90% confidence interval for a final grade when astudent has been absent 6 times.

    1. Find the point estimate:

    The point (6, 82.123) is the point on the regression line withx-coordinate of 6.

    Application

  • 8/12/2019 Corelation and Regression

    26/33

    Construct a 90% confidence interval for a final grade whena student has been absent 6 times.

    2. Find E,

    At the 90% level of confidence, the maximumerror of estimate is 9.438.

    Application

  • 8/12/2019 Corelation and Regression

    27/33

    Construct a 90% confidence interval for a final gradewhen a student has been absent 6 times.

    Whenx= 6, the 90% confidenceinterval is from 72.685 to 91.586.

    3. Find the endpoints.

    Application

    E= 82.123

    9.438 = 72.685

    + E= 82.123 + 9.438 = 91.561

    72.685 < y< 91.561

  • 8/12/2019 Corelation and Regression

    28/33

    Regression Analysis

    The regression equation is

    y= 106

    3.92x

    Predictor Coef StDev T PConstant 105.668 3.655 28.91 0.000

    Minitab Output

    x 3.9241 0.4019 9.76 0.000

    S = 4.307 R-Sq = 95.0% R-Sq(adj) = 94.0%

  • 8/12/2019 Corelation and Regression

    29/33

    Multiple Regression

    Section 9.4

  • 8/12/2019 Corelation and Regression

    30/33

    Absence IQ Grade

    More Explanatory Variables

    8

    2

    5

    12

    15

    9

    6

    115

    135

    126

    110

    105

    120

    125

    78

    92

    90

    58

    43

    74

    81

  • 8/12/2019 Corelation and Regression

    31/33

    Regression Analysis

    The regression equation is

    Grade = 52.7

    2.65 absence + 0.357 IQ

    Predictor Coef StDev T P

    Constant

    AbsenceIQ

    Minitab Output

    S = 4.603 R-Sq = 95.4% R-Sq(adj) = 93.2%

    0.5730.277

    0.571

    0.611.26

    0.62

    86.1102.111

    0.580

    52.7202.652

    0.357

  • 8/12/2019 Corelation and Regression

    32/33

    Interpretation

    The regression equation is

    Grade = 52.7 2.65 absence + 0.357 IQ

    When other variables are 0, the grade is 52.7.

    If IQ is held constant, each time there is one moreabsence the predicted grade will decrease by 2.65points.

    If number of absences is held constant, and IQ isincreased by one point the predicted grade will increaseby 0.357 points.

  • 8/12/2019 Corelation and Regression

    33/33

    The regression equation isGrade = 52.7 2.65 absence + 0.357 IQ

    Predicting the Response Variable

    Use the regression equation to predict a grade when astudent is absent 5 times and has an IQ of 125.

    Grade = 52.7 2.65 absence + 0.357 IQ

    Grade = 52.7 2.65(5) + 0.357(125) = 80.075 (about 80)

    Use the regression equation to predict a grade when astudent is absent 9 times and has an IQ of 120.

    Grade = 52.7 2.65 absence + 0.357 IQ

    Grade = 52.7 2.65(9) + 0.357(120) = 71.69 (about 72)