Download - Lecture slides stats1.13.l19.air
![Page 1: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/1.jpg)
Statistics One
Lecture 19 Chi-square tests
![Page 2: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/2.jpg)
Two segments
• Chi-square goodness of fit • Chi-square test of independence
2
![Page 3: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/3.jpg)
Lecture 19 ~ Segment 1
Chi-square goodness of fit
![Page 4: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/4.jpg)
Chi-square tests
• All of the analyses covered thus far in the course have assumed that the outcome variable is a normally distributed continuous variable • Interval variable • Ratio variable
4
![Page 5: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/5.jpg)
Chi-square tests
• What if the outcome variables is categorical? – For example, nominal variables • Diagnosis (positive, negative) • Verdict (guilty, innocent) • Vote (candidate A, candidate B, candidate C)
5
![Page 6: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/6.jpg)
Chi-square tests
• Chi-square goodness of fit statistic • Chi-square test of independence
• Both can be used in either experimental or correlational research
6
![Page 7: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/7.jpg)
Chi-square tests
• Chi-square goodness of fit statistic – Determines how well a distribution of
proportions “fits” an expected distribution
– In election polls, is there a statistically significant difference in voter preference among candidates?
7
![Page 8: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/8.jpg)
Chi-square tests
• Chi-square test of independence – Determines whether there is a relationship
between two categorical variables
– In election polls, is there a relationship between voter gender and preference among candidates?
8
![Page 9: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/9.jpg)
Chi-square goodness of fit
• New York City mayoral election – Assume a small poll was conducted (N=60) – Do you intend to vote for: • Christine Quinn • Joseph Lhota • Other
9
![Page 10: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/10.jpg)
Chi-square goodness of fit
Quinn Lhota Other
23 12 25
10
![Page 11: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/11.jpg)
Chi-square goodness of fit
• Null hypothesis – Equal proportions
• Alternative hypothesis – Unequal proportions
11
![Page 12: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/12.jpg)
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] O = Observed E = Expected df = # of categories – 1 p-value depends on χ2 and df
12
![Page 13: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/13.jpg)
Chi-square goodness of fit
13
![Page 14: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/14.jpg)
Chi-square goodness of fit
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of categories
14
![Page 15: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/15.jpg)
Chi-square goodness of fit
Quinn Lhota Other
20 (E) 20 (E) 20 (E)
23 (O) 12 (O) 25 (O)
15
![Page 16: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/16.jpg)
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] df = # of categories – 1
16
![Page 17: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/17.jpg)
Chi-square goodness of fit
O E (O – E) (O – E)2 (O – E)2 / E
Quinn 23 20 3 9 0.45
Lhota 12 20 -8 64 3.20
Other 25 20 5 25 1.25
Total 60 60 0 98 4.90
17
![Page 18: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/18.jpg)
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] χ2 = 4.90, df = 2 p = .09 ∴ Retain the null hypothesis and conclude that the slight preferences observed here are not statistically significant
18
![Page 19: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/19.jpg)
Chi-square goodness of fit
To estimate effect size Cramer’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(4.90 / 60(3 – 1)) = 0.20
19
![Page 20: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/20.jpg)
Dataframe in R (Election) Voter.ID Candidate Gender
1 Quinn M
2 Quinn F
3 Other F
4 Lhota M
5 Other M
…. … …
20
![Page 21: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/21.jpg)
Chi-square goodness of fit in R
> Observed <-- table(Election$Candidate) > chisq.test(Observed)
21
![Page 22: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/22.jpg)
Chi-square goodness of fit in R
22
![Page 23: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/23.jpg)
Segment summary
• Chi-square tests are used when outcome and predictor variables are all categorical
• Chi-square goodness of fit is an NHST • Cramér’s V estimates effect size
23
![Page 24: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/24.jpg)
END SEGMENT
![Page 25: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/25.jpg)
Lecture 19 ~ Segment 2
Chi-square test of independence
![Page 26: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/26.jpg)
Chi-square test of independence
• Determines whether there is a relationship between two categorical variables
– In election polls, is there a relationship between voter gender and preference among candidates?
26
![Page 27: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/27.jpg)
Chi-square test of independence
• New York City mayoral election – Assume a small poll was conducted (N=200) – More males than females (n = 140, n = 60) – Do you intend to vote for: • Christine Quinn • Joseph Lhota • Other
27
![Page 28: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/28.jpg)
Chi-square test of independence
Quinn Lhota Other
Female 40 10 10
Male 90 40 10
28
![Page 29: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/29.jpg)
Chi-square test of independence
• Null hypothesis – There is no relationship between voter gender
and voter preference • Alternative hypothesis – There is a relationship between voter gender and
voter preference
29
![Page 30: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/30.jpg)
Chi-square test of independence
χ2 = Σ [(O - E)2 / E] df = (# of rows - 1) * (# of columns - 1) p-value depends on χ2 and df
30
![Page 31: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/31.jpg)
Chi-square test of independence
31
![Page 32: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/32.jpg)
Chi-square test of independence
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of rows or # of categories (whichever is less)
32
![Page 33: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/33.jpg)
Chi-square test of independence
• Compute the expected frequencies – The proportion of male and female voters for
each candidate should be the same as the overall voter preference rates
33
![Page 34: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/34.jpg)
Chi-square test of independence
• Compute the expected frequencies E = (R/N)*C E: Expected frequency R: # of entries in the cell’s row N: total # of entries C: # of entries in the cell’s column
34
![Page 35: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/35.jpg)
Chi-square test of independence
Quinn Lhota Other Sum (R)
Female 40 10 10 60
Male 90 40 10 140
Sum (C) 130 50 20 200
35
![Page 36: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/36.jpg)
Chi-square test of independence
Quinn Lhota Other Sum (R)
Female (60/200)*130 39
(60/200)*50 15
(60/200)*20 6
60
Male (140/200)*130 91
(140/200)*50 35
(140/200)*20 14
140
Sum (C) 130 50 20 200
36
E = (R/N)*C
![Page 37: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/37.jpg)
Chi-square test of independence
O E (O – E) (O – E)2 (O – E)2 / E
F / Quinn 40 39 1 1 0.03
F / Lhota 10 15 -‐5 25 1.67
F / Other 10 6 4 16 2.67
M / Quinn 90 91 1 1 0.01
M / Lhota 40 35 5 25 0.71
M / Other 10 14 -4 16 1.14
Sum 200 200 0 84 6.23
37
![Page 38: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/38.jpg)
Chi-square test of independence χ2 = Σ [(O - E)2 / E] χ2 = 6.23, df = 2 p = .04 ∴ Reject the null hypothesis and conclude that the there is a significant relationship between gender of the defendant and verdict 38
![Page 39: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/39.jpg)
Chi-square test of independence
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(6.23 / 200(2 – 1)) = .18
39
![Page 40: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/40.jpg)
Dataframe in R (Election) Voter.ID Candidate Gender
1 Quinn M
2 Quinn F
3 Other F
4 Lhota M
5 Other M
…. … …
40
![Page 41: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/41.jpg)
Chi-square test in R
> Observed = table(Election$Candidate, Election$Gender) > chisq.test(Observed)
41
![Page 42: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/42.jpg)
Chi-square test in R
42
![Page 43: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/43.jpg)
Assumptions
• Adequate expected cell counts – A common rule is 5 or more in all cells of a 2-
by-2 table, and 5 or more in 80% of cells in larger tables, and no cells with zero.
– When this assumption is not met, Fisher’s exact test, a non-parametric test, is recommended.
43
![Page 44: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/44.jpg)
Assumptions
• Independence – The observations are assumed to be independent of
each other. – This means chi-squared cannot be used to test
correlated data (like matched pairs or panel data). – In such cases McNemar’s test of dependent
proportions is recommended.
44
![Page 45: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/45.jpg)
Segment summary
• Chi-square tests are used when outcome and predictor variables are all categorical
• Chi-square test of independence is an NHST • Cramér’s V estimates effect size • Assumptions – Adequate expected cell counts – Independence
45
![Page 46: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/46.jpg)
END SEGMENT
![Page 47: Lecture slides stats1.13.l19.air](https://reader033.vdocuments.mx/reader033/viewer/2022061304/54958eeeb47959a2508b468e/html5/thumbnails/47.jpg)
END LECTURE 19