1 chapter 17 statistical inference for frequency data ithree applications of pearson’s 2 testing...
TRANSCRIPT
1
Chapter 17
Statistical Inference For Frequency Data
I Three Applications of Pearson’s 2
Testing goodness of fit
Testing independence
Testing equality of proportions
2
A. Testing Goodness of Fit
1. Statistical hypotheses
H0: OPop 1 = EPop 1, . . . , OPop k = EPop k
H1: OPop j ≠ EPop j for some j and j
2. Randomization Plan
One random sample of n elements
Each element is classified in terms of
membership in one of k mutually exclusive
categories
3
B. Testing Independence
1. Statistical hypotheses
H0: p(A and B) = p(A)p(B)
H1: p(A and B) ≠ p(A)p(B)
2. Randomization Plan
One random sample of n elements
Each element is classified in terms of
two variables, denoted by A and B, where
each variable has two or more categories.
4
C. Testing Equality of Proportions
1. Statistical hypotheses
H0: p1 = p2 = . . . = pc
H1: pj ≠ pj for some j and j
2. Randomization Plan
c random samples, where c ≥ 2
For each sample, elements are classified in
terms of membership in one of r = 2 mutually
exclusive categories
5
II Testing Goodness of Fit
A. Chi-Square Distribution
f ( 2)
2
df = 1
df = 2
df = 6 df = 10
6
B. Pearson’s chi-square statistic
1. Oj and Ej denote, respectively, observed and
expected frequencies. k denotes the number of
categories.
2. Critical value of chi square is with = k – 1
degrees of freedom.
2 (O j E j )2
E jj1
k
, 2
7
C. Grade-Distribution Example
1. Is the distribution of grades for summer-school
students in a statistics class different from that for
the fall and spring semesters?
Fall and Spring Summer Grade Proportion Obs. frequency
A .12 15 B .23 21 C .47 30 D .13 6 F .05 0
1.00 24
8
2. The statistical hypotheses are
H0: OPop 1 = EPop 1, . . . , OPop 5 = EPop 5
H1: OPop j ≠ EPop j for some j and j
3. Pearson’s chi-square statistic is
4. Critical value of chi square for = .05, k = 5
categories, and = 5 – 1 = 4 degrees of freedom
is
2 (O j E j )
2
E jj1
k
.05, 42 9.488.
9
Table 1. Computation of Pearson’s Chi-Square for n = 72 Summer-School Students
(1) (2) (3) (4) (5) (6)
Grade Oj pj npj = Ej Oj – Ej
(O j E j )2
E j
A 15 .12 72(.12) = 8.6 6.4 4.763B 21 .23 72(.23) =16.6 4.4 1.166C 30 .47 72(.47) = 33.8 –3.8 0.427D 6 .13 72(.13) = 9.4 –-3.4 1.230F 0 .05 72(.05) = 3.6 –3.6 3.600
72 1.00 72.0 2 = 11.186*
*p < .025
10
5. Degrees of freedom when e parameters of a
theoretical distribution must be estimated is
k – 1 – e.
D. Practical Significance
1. Cohen’s w
where and denote, the observed and
expected proportions in the jth category.
w ( p j p j )
2
p jj1
k
jp jp
11
2. Simpler equivalent formula for Cohen’s
w 2
n
11.186
720.046
3. Cohen’s guidelines for interpreting w
0.1 is a small effect
0.3 is a medium effect
0.5 is a large effect
w
12
E. Yates’ Correction
1. When = 1, Yates’ correction can be applied to
make the sampling distribution of the test statistic
for Oj – Ej , which is discrete, better approximate
the chi-square distribution.
2 (| O j E j | 0.5)2
E jj1
k
13
F. Assumptions of the Goodness-of-Fit Test
1. Every observation is assigned to one and only
one category.
2. The observations are independent
3. If = 1, every expected frequency should be at
least 10. If > 1, every expected frequency should
be at least 5.
14
III Testing Independence
A. Statistical Hypotheses
H0: p(A and B) = p(A)p(B)
H1: p(A and B) ≠ p(A)p(B)
B. Chi-Square Statistic for an r c Contingency Table with i = 1, . . . , r Rows and j = 1, . . . , c Columns
2 (Oij Eij )
2
Eijj1
c
i1
r
15
C. Computational Example: Is Success on an Employment-Test Item Independent of Gender?
Observed Expected
b1 b2 b1 b2
Fail Pass Fail Pass
a1 Man 84 18 102 88.9 13.1a2 Women 93 8 101 88.1 12.9
177 26 203
2 (Oij Eij )
2
Eijj1
c
i1
r 4.299 * .05, 1
2 3.841
16
D. Computation of expected frequencies
1. A and B are statistically independent if
p(ai and bj) = p(ai)p(bj)
2. Expected frequency, for the cell in
row i and column j
Eai and bj
np(ai ) p(bj )
(nai
nbj) / n
Eai and bj
,
n(nai
/ n)(nbj/ n)
17
Ea2 and b1
(na2nb1
) / n (101)(177) / 203 88.1
Ea1 and b1
(na1nb1
) / n (102)(177) / 203 88.9
Ea2 and b2
(na2nb2
) / n (101)(26) / 203 12.9
Ea1 and b2
(na1nb2
) / n (102)(26) / 203 13.1
Observed Expectedb1 b2 b1 b2
a1 84 18 102 88.9 13.1
a2 93 8 101 88.1 12.9
177 26 203
18
E. Degrees of Freedom for an r c Contingency Table
df = k – 1 – e
= rc – 1 – [(r – 1) + (c – 1)]
= rc – 1 – r + 1 – c + 1
= rc – r – c + 1
= (r – 1)(c – 1)
= (2 – 1)(2 – 1) = 1
19
F. Strength of Association and Practical Significance
V observed
maximum
2 / n
s 1
2
n(s 1)
where s is the smaller of the number of rows and
columns.
V 2
n(s 1)
4.299
203(2 1)0.146
1. Cramér’s V
20
w ( pij pij )
2
pijj1
c
i1
r
2
n0.146
3. For a contingency table, an alternative formula for
is
w V s 1 0.146 2 1 0.146
2. Practical significance, Cohen’s ŵ
w
21
G. Three-By-Three Contingency Table
1. Motivation and education of conscientious
objectors during WWII
High GradeCollege School School Total
Coward 12 25 35 72Partly Coward 19 23 30 72Not Coward 71 56 24 151
Total 102 104 89 295
22
2 (Oij Eij )
2
Eijj1
c
i1
r 36.681* .05, 4
2 9.488
(r 1)(c 1) (3 1)(3 1) 4
2. Strength of Association, Cramér’s
3. Practical significance
w V s 1 0.249 3 1 0.352
V 2
n(s 1)
36.681
295(3 1)0.249
V
23
H. Assumptions of the Independence Test
1. Every observation is assigned to one and only
one cell of the contingency table.
2. The observations are independent
3. If = 1, every expected frequency should be at
least 10. If > 1, every expected frequency should
be at least 5.
24
IV Testing Equality of c ≥ 2 Proportions
A. Statistical Hypotheses
H0: p1 = p2 = . . . = pc
H1: pj ≠ pj for some j and j
1. Computational example: three samples of n = 100
residents of nursing homes were surveyed.
Variable A was age heterogeneity in the home;
variable B was resident satisfaction.
25
Table 2. Nursing Home Data
Age Heterogeneity
Low b1 Medium b2 High b3
Satisfied a1 O = 56 O = 58 O = 38
E = 50.67 E = 50.67 E = 50.67
Not Satisfied a2 O = 44 O = 42 O = 52
E = 49.33 E = 49.33 E = 49.33
26
2 (Oij Eij )
2
Eijj1
c
i1
r 9.708*
.05, 22 5.991
(r 1)(c 1) (2 1)(3 1) 2
B. Assumptions of the Equality of ProportionsTest
1. Every observation is assigned to one and only
one cell of the contingency table.
27
2. The observations are independent
3. If = 1, every expected frequency should be at
least 10. If > 1, every expected frequency should
be at least 5.
C. Test of Homogeneity of Proportions
1. Extension of the test of equality of
proportions when variable A has r > 2 rows
28
2. Statistical hypotheses
for columns j and j'
H1 : pai |b jpai |b j
in at least one row
crrr
c
c
bababa
bababa
bababa
o
PPP
PPP
PPP
H
|||
|||
|||
21
22212
12111
: