introduction to statistics

32
Introduction to Introduction to Statistics Statistics Correlation Correlation Chapter 15 Chapter 15 Apr 29-May 4, 2010 Apr 29-May 4, 2010 Classes #28-29 Classes #28-29

Upload: murphy-rice

Post on 30-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Statistics. Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29. Correlation. Chapter 15: Correlation pp. 466-485 Not responsible for remainder of the chapter. Correlation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Statistics

Introduction to StatisticsIntroduction to Statistics

CorrelationCorrelationChapter 15Chapter 15

Apr 29-May 4, 2010Apr 29-May 4, 2010Classes #28-29Classes #28-29

Page 2: Introduction to Statistics

CorrelationCorrelation

Chapter 15:Chapter 15:– Correlation pp. 466-485Correlation pp. 466-485– Not responsible for remainder of the chapterNot responsible for remainder of the chapter

Page 3: Introduction to Statistics

CorrelationCorrelation

A statistical technique that is used to A statistical technique that is used to measure and describe a relationship measure and describe a relationship between two variablesbetween two variables– For example: For example:

GPA and TD’s scoredGPA and TD’s scored

Statistics exam scores and amount of time spent Statistics exam scores and amount of time spent studyingstudying

Page 4: Introduction to Statistics

NotationNotation

A correlation requires two scores for each A correlation requires two scores for each individual individual – One score from each of the two variablesOne score from each of the two variables– They are normally identified as X and YThey are normally identified as X and Y

Page 5: Introduction to Statistics

Three characteristics of X and Y Three characteristics of X and Y are being measured…are being measured…

The direction of the relationshipThe direction of the relationship– Positive or negativePositive or negative

The form of the relationshipThe form of the relationship– Usually linear formUsually linear form

The strength or consistency of the The strength or consistency of the relationshiprelationship– Perfect correlation = 1.00; no consistency would Perfect correlation = 1.00; no consistency would

be 0.00be 0.00– Therefore, a correlation measures the degree of Therefore, a correlation measures the degree of

relationship between two variables on a scale relationship between two variables on a scale from 0.00 to 1.00.from 0.00 to 1.00.

Page 6: Introduction to Statistics

AssumptionsAssumptionsThere are 3 main assumptions…There are 3 main assumptions…

– 1. The dependent and independent are normally 1. The dependent and independent are normally distributed. We can test this by looking at the histograms distributed. We can test this by looking at the histograms for the two variablesfor the two variables

– 2. The relationship between X and Y is linear. We can 2. The relationship between X and Y is linear. We can check this by looking at the scattergramcheck this by looking at the scattergram

– 3. The relationship is homoscedastic. We can test 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line.cigar-shaped pattern” about the regression line.

If the above 3 assumptions have been met, then we can use If the above 3 assumptions have been met, then we can use correlation and test r for significancecorrelation and test r for significance

Page 7: Introduction to Statistics

Pearson Pearson rr

The most commonly used correlationThe most commonly used correlation

Measures the degree of straight-line Measures the degree of straight-line relationshiprelationship

Computation:Computation:

r = SP / (SSr = SP / (SSXX)(SS)(SSYY))

Page 8: Introduction to Statistics

Example 1Example 1

A researcher predicts that there is a high A researcher predicts that there is a high correlation between scores on the stats correlation between scores on the stats final exam (100 pts max) and scores on final exam (100 pts max) and scores on the university’s exit exam for graduating the university’s exit exam for graduating seniors (330 pts max)seniors (330 pts max)

Page 9: Introduction to Statistics

Example 1Example 1

X 30 38 52 90 95305

Y160180180210240970

X2

900 1,444 2,704 8,100 9,025 22,173

Y2

25,600 32,400 32,400 44,100 57,600 192,100

XY 4,800 6,840 9,36018,90022,80062,700

(X) (X2) (Y) (Y2) (XY)

Page 10: Introduction to Statistics

Example 1Example 1

SSSSX X = = XX22 - - ((X)X)2 2 = 22,173 - = 22,173 - 30530522 = =

nn 5 5= 22,173 - 93025/5 = 22,173 - 18,605= 22,173 - 93025/5 = 22,173 - 18,605= 3,568= 3,568

SSY = Y2 - (Y)2 = 192,100 - 9702 = n 5

= 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920

Page 11: Introduction to Statistics

Example 1Example 1

SP = SP = XY - XY - ((X)(X)(Y)Y) = =

nn

62,700 - 62,700 - (305)(970)(305)(970)

55

= 62,700 - 295,850/5 = 62,700 - 59,170= 62,700 - 295,850/5 = 62,700 - 59,170

= 3,530= 3,530

Page 12: Introduction to Statistics

Example 1Example 1

r = SP / (SSr = SP / (SSXX)(SS)(SSYY))

= 3,530 / (3,568)(3,920)= 3,530 / (3,568)(3,920)

= 3,530 / 13,986,560= 3,530 / 13,986,560

= 3,530 / 3,739.861= 3,530 / 3,739.861

= .944= .944

Page 13: Introduction to Statistics

Pearson Correlation: Pearson Correlation: “Rule of Thumb”“Rule of Thumb”

If If r r = 1.00 Perfect Correlation= 1.00 Perfect Correlation+ .70 to +.99 Very strong positive relationship + .70 to +.99 Very strong positive relationship + .40 to +.69 Strong positive relationship + .40 to +.69 Strong positive relationship + .30 to +.39 Moderate positive relationship + .30 to +.39 Moderate positive relationship + .20 to +.29 Weak positive relationship + .20 to +.29 Weak positive relationship + .01 to +.19 No or negligible relationship + .01 to +.19 No or negligible relationship - .01 to -.19 No or negligible relationship - .01 to -.19 No or negligible relationship - .20 to -.29 Weak negative relationship - .20 to -.29 Weak negative relationship - .30 to -.39 Moderate negative relationship - .30 to -.39 Moderate negative relationship - .40 to -.69 Strong negative relationship - .40 to -.69 Strong negative relationship - .70 or higher Very strong negative relationship - .70 or higher Very strong negative relationship

Page 14: Introduction to Statistics

Example 1: Example 1: InterpretationInterpretation

An An rr of 0.944 indicates an extremely of 0.944 indicates an extremely strong relationship between scores on the strong relationship between scores on the stats final exam and scores on the exit stats final exam and scores on the exit exam. As scores on the stats final go up so exam. As scores on the stats final go up so too do scores on the exit exam.too do scores on the exit exam.– But we are not finished with the interpretationBut we are not finished with the interpretation

See next slide See next slide

Page 15: Introduction to Statistics

Interpretation (Continued)Interpretation (Continued)Coefficient of Determination Coefficient of Determination

(r(r22)) The value The value rr22 is called the coefficient of is called the coefficient of

determination because it measures the determination because it measures the proportion in variability in one variable proportion in variability in one variable that can be determined from the that can be determined from the relationship with the other variablerelationship with the other variable– For example:For example:

A correlation of r = .944 means that rA correlation of r = .944 means that r2 2

== .891 (or 89.1%) of the variability in .891 (or 89.1%) of the variability in the Y scores can be predicted from the the Y scores can be predicted from the relationship with the X scoresrelationship with the X scores

Page 16: Introduction to Statistics

Coefficient of Determination Coefficient of Determination (r(r22) and Interpret:) and Interpret:

The coefficient of determination is rThe coefficient of determination is r22 = .891. Scores on the stats final = .891. Scores on the stats final

exam, by itself, accounts for 89.1% exam, by itself, accounts for 89.1% of the variation of the exit exam of the variation of the exit exam

scores.scores.

891.)944(.)( 222 rr

Page 17: Introduction to Statistics

Example 2Example 2

A researcher predicts that there is a high A researcher predicts that there is a high correlation between years of education and correlation between years of education and voter turnoutvoter turnout– She chooses Alamosa, Boston, Chicago, Detroit, and She chooses Alamosa, Boston, Chicago, Detroit, and

NYC to test her theoryNYC to test her theory

Page 18: Introduction to Statistics

Example 2Example 2

The scores on each The scores on each variable are displayed variable are displayed in table format:in table format:– Y = % TurnoutY = % Turnout– X = Years of X = Years of

EducationEducation

CityCity XX YY

AlamosaAlamosa 11.911.9 5555

BostonBoston 12.112.1 6060

ChicagoChicago 12.712.7 6565

DetroitDetroit 12.812.8 6868

NYCNYC 13.013.0 7070

Page 19: Introduction to Statistics

ScatterplotScatterplot

The relationship between X and Y is linear. The relationship between X and Y is linear.

Page 20: Introduction to Statistics

Make a Computational TableMake a Computational Table

5.125/5.62/ NXX

XX YY XX22 YY22 XYXY

11.911.9 5555 141.61141.61 30253025 654.5654.5

12.112.1 6060 146.41146.41 36003600 726726

12.712.7 6565 161.29161.29 42254225 825.5825.5

12.812.8 6868 163.84163.84 46244624 870.4870.4

13.013.0 7070 169169 49004900 910910

∑∑X X == 62.562.5 ∑∑Y =Y = 318 318 ∑∑XX2 2

==782.15782.15∑∑YY2 2 ==

2037420374∑∑XY =XY =

3986.43986.4

6.635/318/ NYY

Page 21: Introduction to Statistics

Example 2Example 2

SSSSX X = = XX22 - - ((X)X)2 2 = = 782.15782.15 - - 62.562.522 = =

nn 5 5= 782.15 - 3906.25/5 = = 782.15 - 3906.25/5 = 782.15782.15 – 781.25 – 781.25= 0.9= 0.9

SSY = Y2 - (Y)2 = 20374 - 3182 = n 5

= 20374 - 101124/5 = 20374 – 20224.80 = 149.20

Page 22: Introduction to Statistics

Example 2Example 2

SP = SP = XY - XY - ((X)(X)(Y)Y) = =

nn

3986.40 - 3986.40 - (62.5)(318)(62.5)(318)

55

= 3986.40 - 19875/5 = 3986.40 – = 3986.40 - 19875/5 = 3986.40 – 3975.003975.00

= 11.40= 11.40

Page 23: Introduction to Statistics

Example 2: Find Pearson Example 2: Find Pearson rr

r = SP / (SSr = SP / (SSXX)(SS)(SSYY))

= 11.4 / (0.9)(149.2)= 11.4 / (0.9)(149.2)

= 11.4 / 134.28= 11.4 / 134.28

= 11.4/ 11.58= 11.4/ 11.58

= .984= .984

Page 24: Introduction to Statistics

Example 2: Example 2: InterpretationInterpretation

An r of 0.984 indicates an extremely An r of 0.984 indicates an extremely strong relationship between years of strong relationship between years of education and voter turnout for these five education and voter turnout for these five cities. As level of education increases, % cities. As level of education increases, % turnout increases.turnout increases.– But we are not finished with the interpretationBut we are not finished with the interpretation

See next slide See next slide

Page 25: Introduction to Statistics

Coefficient of Determination Coefficient of Determination (r(r22) and Interpret:) and Interpret:

The coefficient of determination is rThe coefficient of determination is r22 = .968. Education, by itself, = .968. Education, by itself,

accounts for 96.8% of the variation accounts for 96.8% of the variation in voter turnout.in voter turnout.

968.)984(.)( 222 rr

Page 26: Introduction to Statistics

Pearson’s rPearson’s r

Had the relationship between % college Had the relationship between % college educated and turnout, educated and turnout, r r =.32.=.32.– This relationship would have been positive This relationship would have been positive

and weak to moderate.and weak to moderate.

Had the relationship between % college Had the relationship between % college educated and turnout, educated and turnout, r r = -.12.= -.12.– This relationship would have been negative This relationship would have been negative

and weak.and weak.

Page 27: Introduction to Statistics

Hypothesis Testing with PearsonHypothesis Testing with Pearson

We can have a two-tailed hypothesis:We can have a two-tailed hypothesis:HHoo: : ρρ = 0.0 = 0.0

HH11: : ρρ ≠ 0.0 ≠ 0.0

We can have a one-tailed hypothesis:We can have a one-tailed hypothesis:

HHoo: : ρρ = 0.0 = 0.0

HH11: : ρρ < 0.0 (or < 0.0 (or ρρ > 0.0) > 0.0)

Note that Note that ρρ (rho) is the population parameter, while r is the (rho) is the population parameter, while r is the sample statisticsample statistic

Page 28: Introduction to Statistics

Find Find rrcriticalcritical

See Table B.6 (page 537)See Table B.6 (page 537)– You need to know the alpha levelYou need to know the alpha level– You need to know the sample sizeYou need to know the sample size– See that we always will use:See that we always will use: df df = n-2= n-2

Page 29: Introduction to Statistics

Find Find rrcalculatedcalculated

See previous slides for formulasSee previous slides for formulas

Page 30: Introduction to Statistics

Make you decision…Make you decision…

rrcalculatedcalculated < < rrcritical critical thenthen Retain HRetain H00

rrcalculatedcalculated > > rrcritical critical thenthen Reject HReject H00

Page 31: Introduction to Statistics

Always include a brief summary Always include a brief summary of your results:of your results:

Was it positive or negative?Was it positive or negative?

Was it significant ?Was it significant ?

Explain the correlationExplain the correlation

Explain the variationExplain the variation– Coefficient of Determination (rCoefficient of Determination (r22))

Page 32: Introduction to Statistics

CreditsCredits

http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review

http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1Healey P. 418 Problem 15.1