introduction to statistics
DESCRIPTION
Introduction to Statistics. Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29. Correlation. Chapter 15: Correlation pp. 466-485 Not responsible for remainder of the chapter. Correlation. - PowerPoint PPT PresentationTRANSCRIPT
Introduction to StatisticsIntroduction to Statistics
CorrelationCorrelationChapter 15Chapter 15
Apr 29-May 4, 2010Apr 29-May 4, 2010Classes #28-29Classes #28-29
CorrelationCorrelation
Chapter 15:Chapter 15:– Correlation pp. 466-485Correlation pp. 466-485– Not responsible for remainder of the chapterNot responsible for remainder of the chapter
CorrelationCorrelation
A statistical technique that is used to A statistical technique that is used to measure and describe a relationship measure and describe a relationship between two variablesbetween two variables– For example: For example:
GPA and TD’s scoredGPA and TD’s scored
Statistics exam scores and amount of time spent Statistics exam scores and amount of time spent studyingstudying
NotationNotation
A correlation requires two scores for each A correlation requires two scores for each individual individual – One score from each of the two variablesOne score from each of the two variables– They are normally identified as X and YThey are normally identified as X and Y
Three characteristics of X and Y Three characteristics of X and Y are being measured…are being measured…
The direction of the relationshipThe direction of the relationship– Positive or negativePositive or negative
The form of the relationshipThe form of the relationship– Usually linear formUsually linear form
The strength or consistency of the The strength or consistency of the relationshiprelationship– Perfect correlation = 1.00; no consistency would Perfect correlation = 1.00; no consistency would
be 0.00be 0.00– Therefore, a correlation measures the degree of Therefore, a correlation measures the degree of
relationship between two variables on a scale relationship between two variables on a scale from 0.00 to 1.00.from 0.00 to 1.00.
AssumptionsAssumptionsThere are 3 main assumptions…There are 3 main assumptions…
– 1. The dependent and independent are normally 1. The dependent and independent are normally distributed. We can test this by looking at the histograms distributed. We can test this by looking at the histograms for the two variablesfor the two variables
– 2. The relationship between X and Y is linear. We can 2. The relationship between X and Y is linear. We can check this by looking at the scattergramcheck this by looking at the scattergram
– 3. The relationship is homoscedastic. We can test 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line.cigar-shaped pattern” about the regression line.
If the above 3 assumptions have been met, then we can use If the above 3 assumptions have been met, then we can use correlation and test r for significancecorrelation and test r for significance
Pearson Pearson rr
The most commonly used correlationThe most commonly used correlation
Measures the degree of straight-line Measures the degree of straight-line relationshiprelationship
Computation:Computation:
r = SP / (SSr = SP / (SSXX)(SS)(SSYY))
Example 1Example 1
A researcher predicts that there is a high A researcher predicts that there is a high correlation between scores on the stats correlation between scores on the stats final exam (100 pts max) and scores on final exam (100 pts max) and scores on the university’s exit exam for graduating the university’s exit exam for graduating seniors (330 pts max)seniors (330 pts max)
Example 1Example 1
X 30 38 52 90 95305
Y160180180210240970
X2
900 1,444 2,704 8,100 9,025 22,173
Y2
25,600 32,400 32,400 44,100 57,600 192,100
XY 4,800 6,840 9,36018,90022,80062,700
(X) (X2) (Y) (Y2) (XY)
Example 1Example 1
SSSSX X = = XX22 - - ((X)X)2 2 = 22,173 - = 22,173 - 30530522 = =
nn 5 5= 22,173 - 93025/5 = 22,173 - 18,605= 22,173 - 93025/5 = 22,173 - 18,605= 3,568= 3,568
SSY = Y2 - (Y)2 = 192,100 - 9702 = n 5
= 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920
Example 1Example 1
SP = SP = XY - XY - ((X)(X)(Y)Y) = =
nn
62,700 - 62,700 - (305)(970)(305)(970)
55
= 62,700 - 295,850/5 = 62,700 - 59,170= 62,700 - 295,850/5 = 62,700 - 59,170
= 3,530= 3,530
Example 1Example 1
r = SP / (SSr = SP / (SSXX)(SS)(SSYY))
= 3,530 / (3,568)(3,920)= 3,530 / (3,568)(3,920)
= 3,530 / 13,986,560= 3,530 / 13,986,560
= 3,530 / 3,739.861= 3,530 / 3,739.861
= .944= .944
Pearson Correlation: Pearson Correlation: “Rule of Thumb”“Rule of Thumb”
If If r r = 1.00 Perfect Correlation= 1.00 Perfect Correlation+ .70 to +.99 Very strong positive relationship + .70 to +.99 Very strong positive relationship + .40 to +.69 Strong positive relationship + .40 to +.69 Strong positive relationship + .30 to +.39 Moderate positive relationship + .30 to +.39 Moderate positive relationship + .20 to +.29 Weak positive relationship + .20 to +.29 Weak positive relationship + .01 to +.19 No or negligible relationship + .01 to +.19 No or negligible relationship - .01 to -.19 No or negligible relationship - .01 to -.19 No or negligible relationship - .20 to -.29 Weak negative relationship - .20 to -.29 Weak negative relationship - .30 to -.39 Moderate negative relationship - .30 to -.39 Moderate negative relationship - .40 to -.69 Strong negative relationship - .40 to -.69 Strong negative relationship - .70 or higher Very strong negative relationship - .70 or higher Very strong negative relationship
Example 1: Example 1: InterpretationInterpretation
An An rr of 0.944 indicates an extremely of 0.944 indicates an extremely strong relationship between scores on the strong relationship between scores on the stats final exam and scores on the exit stats final exam and scores on the exit exam. As scores on the stats final go up so exam. As scores on the stats final go up so too do scores on the exit exam.too do scores on the exit exam.– But we are not finished with the interpretationBut we are not finished with the interpretation
See next slide See next slide
Interpretation (Continued)Interpretation (Continued)Coefficient of Determination Coefficient of Determination
(r(r22)) The value The value rr22 is called the coefficient of is called the coefficient of
determination because it measures the determination because it measures the proportion in variability in one variable proportion in variability in one variable that can be determined from the that can be determined from the relationship with the other variablerelationship with the other variable– For example:For example:
A correlation of r = .944 means that rA correlation of r = .944 means that r2 2
== .891 (or 89.1%) of the variability in .891 (or 89.1%) of the variability in the Y scores can be predicted from the the Y scores can be predicted from the relationship with the X scoresrelationship with the X scores
Coefficient of Determination Coefficient of Determination (r(r22) and Interpret:) and Interpret:
The coefficient of determination is rThe coefficient of determination is r22 = .891. Scores on the stats final = .891. Scores on the stats final
exam, by itself, accounts for 89.1% exam, by itself, accounts for 89.1% of the variation of the exit exam of the variation of the exit exam
scores.scores.
891.)944(.)( 222 rr
Example 2Example 2
A researcher predicts that there is a high A researcher predicts that there is a high correlation between years of education and correlation between years of education and voter turnoutvoter turnout– She chooses Alamosa, Boston, Chicago, Detroit, and She chooses Alamosa, Boston, Chicago, Detroit, and
NYC to test her theoryNYC to test her theory
Example 2Example 2
The scores on each The scores on each variable are displayed variable are displayed in table format:in table format:– Y = % TurnoutY = % Turnout– X = Years of X = Years of
EducationEducation
CityCity XX YY
AlamosaAlamosa 11.911.9 5555
BostonBoston 12.112.1 6060
ChicagoChicago 12.712.7 6565
DetroitDetroit 12.812.8 6868
NYCNYC 13.013.0 7070
ScatterplotScatterplot
The relationship between X and Y is linear. The relationship between X and Y is linear.
Make a Computational TableMake a Computational Table
5.125/5.62/ NXX
XX YY XX22 YY22 XYXY
11.911.9 5555 141.61141.61 30253025 654.5654.5
12.112.1 6060 146.41146.41 36003600 726726
12.712.7 6565 161.29161.29 42254225 825.5825.5
12.812.8 6868 163.84163.84 46244624 870.4870.4
13.013.0 7070 169169 49004900 910910
∑∑X X == 62.562.5 ∑∑Y =Y = 318 318 ∑∑XX2 2
==782.15782.15∑∑YY2 2 ==
2037420374∑∑XY =XY =
3986.43986.4
6.635/318/ NYY
Example 2Example 2
SSSSX X = = XX22 - - ((X)X)2 2 = = 782.15782.15 - - 62.562.522 = =
nn 5 5= 782.15 - 3906.25/5 = = 782.15 - 3906.25/5 = 782.15782.15 – 781.25 – 781.25= 0.9= 0.9
SSY = Y2 - (Y)2 = 20374 - 3182 = n 5
= 20374 - 101124/5 = 20374 – 20224.80 = 149.20
Example 2Example 2
SP = SP = XY - XY - ((X)(X)(Y)Y) = =
nn
3986.40 - 3986.40 - (62.5)(318)(62.5)(318)
55
= 3986.40 - 19875/5 = 3986.40 – = 3986.40 - 19875/5 = 3986.40 – 3975.003975.00
= 11.40= 11.40
Example 2: Find Pearson Example 2: Find Pearson rr
r = SP / (SSr = SP / (SSXX)(SS)(SSYY))
= 11.4 / (0.9)(149.2)= 11.4 / (0.9)(149.2)
= 11.4 / 134.28= 11.4 / 134.28
= 11.4/ 11.58= 11.4/ 11.58
= .984= .984
Example 2: Example 2: InterpretationInterpretation
An r of 0.984 indicates an extremely An r of 0.984 indicates an extremely strong relationship between years of strong relationship between years of education and voter turnout for these five education and voter turnout for these five cities. As level of education increases, % cities. As level of education increases, % turnout increases.turnout increases.– But we are not finished with the interpretationBut we are not finished with the interpretation
See next slide See next slide
Coefficient of Determination Coefficient of Determination (r(r22) and Interpret:) and Interpret:
The coefficient of determination is rThe coefficient of determination is r22 = .968. Education, by itself, = .968. Education, by itself,
accounts for 96.8% of the variation accounts for 96.8% of the variation in voter turnout.in voter turnout.
968.)984(.)( 222 rr
Pearson’s rPearson’s r
Had the relationship between % college Had the relationship between % college educated and turnout, educated and turnout, r r =.32.=.32.– This relationship would have been positive This relationship would have been positive
and weak to moderate.and weak to moderate.
Had the relationship between % college Had the relationship between % college educated and turnout, educated and turnout, r r = -.12.= -.12.– This relationship would have been negative This relationship would have been negative
and weak.and weak.
Hypothesis Testing with PearsonHypothesis Testing with Pearson
We can have a two-tailed hypothesis:We can have a two-tailed hypothesis:HHoo: : ρρ = 0.0 = 0.0
HH11: : ρρ ≠ 0.0 ≠ 0.0
We can have a one-tailed hypothesis:We can have a one-tailed hypothesis:
HHoo: : ρρ = 0.0 = 0.0
HH11: : ρρ < 0.0 (or < 0.0 (or ρρ > 0.0) > 0.0)
Note that Note that ρρ (rho) is the population parameter, while r is the (rho) is the population parameter, while r is the sample statisticsample statistic
Find Find rrcriticalcritical
See Table B.6 (page 537)See Table B.6 (page 537)– You need to know the alpha levelYou need to know the alpha level– You need to know the sample sizeYou need to know the sample size– See that we always will use:See that we always will use: df df = n-2= n-2
Find Find rrcalculatedcalculated
See previous slides for formulasSee previous slides for formulas
Make you decision…Make you decision…
rrcalculatedcalculated < < rrcritical critical thenthen Retain HRetain H00
rrcalculatedcalculated > > rrcritical critical thenthen Reject HReject H00
Always include a brief summary Always include a brief summary of your results:of your results:
Was it positive or negative?Was it positive or negative?
Was it significant ?Was it significant ?
Explain the correlationExplain the correlation
Explain the variationExplain the variation– Coefficient of Determination (rCoefficient of Determination (r22))
CreditsCredits
http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review
http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1Healey P. 418 Problem 15.1