ch2. contingency tables 2 - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung1/3.pdf–...

Ch2. Contingency Tables_2

Namhyoung Kim

Dept. of Applied Statistics

Gachon University

[email protected]

1

2.3 The Odds Ratio

• For a probability of success 𝜋𝜋, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 𝜋𝜋/(1 − 𝜋𝜋) =prob. of success/prob. of failure • The odds are nonnegative

𝜋𝜋 =𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜

𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 + 1

• In 2x2 tables, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜1 = 𝜋𝜋1/(1 − 𝜋𝜋1) and 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜2 =𝜋𝜋2/(1 − 𝜋𝜋2)

• The odds ratio 𝜃𝜃: another measure of association

𝜃𝜃 =𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜

=𝜋𝜋1/(1 − 𝜋𝜋1)𝜋𝜋2/(1 − 𝜋𝜋2)

2

Properties of the Odds Ratio

3

• The odds ratio can equal any nonnegative number.

• When X and Y are independent, 𝜋𝜋1 = 𝜋𝜋2 odds1=odds2 and 𝜃𝜃 = 1

• When 𝜃𝜃 >1, the odds of success are higher in row 1 than in row 2.

• Values of 𝜃𝜃 father from 1.0 in a given direction represent stronger association.


4

• When one value is the inverse of the other represent the same strength of association, but in opposite direction 𝜃𝜃=0.25 is equivalent to 𝜃𝜃=1/0.25=4

• The odds ratio does not change value when the table orientation reverses – it is unnecessary to identify one

classification as a response variable in order to estimate 𝜃𝜃 (cf. the relative risk requires this)


5

• When both variables are response variables 𝜃𝜃 = 𝜋𝜋11/𝜋𝜋12

𝜋𝜋21/𝜋𝜋22= 𝜋𝜋11𝜋𝜋22

𝜋𝜋12𝜋𝜋21

• The odds ratio is also called the cross-product ratio.

• The sample odds ratio

𝜃𝜃� =𝑝𝑝1/(1 − 𝑝𝑝1)𝑝𝑝2/(1 − 𝑝𝑝2)

=𝑛𝑛11/𝑛𝑛12𝑛𝑛21/𝑛𝑛22

=𝑛𝑛11𝑛𝑛22𝑛𝑛12𝑛𝑛21

• This is the ML estimator of 𝜃𝜃

Example: Odds Ratio for Aspirin Use and Heart Attacks

• For the physicians taking placebo, the estimated odds of MI : n11/n12=189/10845=0.0174

• For those taking aspirin : 104/10933=0.0095 • The sample odds ratio �̂�𝜃 =0.0174/0.0095=1.832 The estimated

odds were 83% higher for the placebo group

6

Inference for Odds Ratios and Log Odds Ratios

• Unless the sample size is extremely large, the sampling distribution of the odds ratio is highly skewed. (positive skew, skewed to the right)

• Because of this skewness, use an alternative but equivalent measure log(𝜃𝜃)

• independence corresponds to log(𝜃𝜃)=0 • The log odds ratio is symmetric about

zero

7


• Its approximating normal dist. has a mean of log(𝜃𝜃) and a SE

𝑆𝑆𝑆𝑆 =1𝑛𝑛11

+1𝑛𝑛12

+1𝑛𝑛21

+1𝑛𝑛22

• C.I. for log(𝜃𝜃) log 𝜃𝜃� ± 𝑧𝑧𝛼𝛼

2(𝑆𝑆𝑆𝑆)

• Exponentiating endpoints of this C.I. yields one for 𝜃𝜃

8


• For Table 2.3, log(1.832)=0.605

• 𝑆𝑆𝑆𝑆 = 1189

+ 110933

+ 1104

+ 110845

= 0.𝑜𝑜3

• a 95% C.I. for log𝜃𝜃 equals 0.605 ±1.96(0.123) or (0.365,0.846) • the corresponding C.I. for 𝜃𝜃 is [exp(0.365), exp(0.846)]=(1.44, 2.33)

9


• The sample odds ratio 𝜃𝜃� equals 0 or ∞ if any 𝑛𝑛𝑖𝑖𝑖𝑖=0, and it is undefined if both entries in a row or column are zero.

• The slightly amended estimator

𝜃𝜃� =(𝑛𝑛11 + 0.5)(𝑛𝑛22 + 0.5)(𝑛𝑛12 + 0.5)(𝑛𝑛21 + 0.5)

10

Relationship Between Odds Ratio and Relative Risk

• Odds ratio= 𝑝𝑝1/(1−𝑝𝑝1)𝑝𝑝2/(1−𝑝𝑝2)

= 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑟𝑟𝑅𝑅𝑜𝑜𝑟𝑟 × (1−𝑝𝑝2)(1−𝑝𝑝1)

• When 𝑝𝑝1 and 𝑝𝑝2 are both close to zero, the fraction in the last term of this expression equals approximately 1.0 odds ratio and relative risk take similar values

• For Table 2.3, the sample odds ratio of 1.83 is similar to the sample relative risk of 1.82

• In such a case, an odds ratio of 1.83 does mean that 𝑝𝑝1 is approximately 1.83 times 𝑝𝑝2

11

The Odds Ratio Applies in Case-Control Studies

• The marginal dist. of MI is fixed by the sampling design. (each case was matched with two control patients)

• The outcome measured for each subject is whether she was a smoker

• The study, which uses a retrospective design to look into the past, is called a case-control study – common in health-related applications

12

The Odds Ratio Applies in Case-Control Studies

• estimate the conditional distribution of smoking status, given MI status. – for women suffering MI, 172/262=0.656 – for women who had not suffered MI,

173/519=0.333 • the sample odds ratio is [0.656/(1-

0.656)]/[0.333/(1-0.333)]=(172x346)/(173x90)=3.8 • if we expect P(Y=1|X) to be small, then the

sample odds ratio as a rough indication of the relative risk women who had ever smoked were about four times as likely to suffer MI as women who had never smoked.

13

Types of Observational Studies

• retrospective design(후향적 연구설계) – case-control study

• prospective design(전향적 연구설계) – cohort study – clinical trials

• cross-sectional design(횡단연구설계)

• Observational study – case-control, cohort, and cross-sectional design

• Experimental study – a clinical trial

14

2.4 Chi-Squared Tests of Independence

• Consider the null hypothesis (H0) that cell probabilities equal certain fixed value {𝜋𝜋𝑖𝑖𝑖𝑖}

• For a sample size n with cell counts {𝑛𝑛𝑖𝑖𝑖𝑖}, the values {𝜇𝜇𝑖𝑖𝑖𝑖 = 𝑛𝑛𝜋𝜋𝑖𝑖𝑖𝑖} are expected frequencies.

• To judge whether the data contradict H0, we compare {𝑛𝑛𝑖𝑖𝑖𝑖} to {𝜇𝜇𝑖𝑖𝑖𝑖}

• The larger the differences {𝑛𝑛𝑖𝑖𝑖𝑖 − 𝜇𝜇𝑖𝑖𝑖𝑖}, the stronger the evidence against H0.

15

Pearson Statistics and the Chi-squared Distribution

• The Pearson chi-squared statistic for testing H0

• 𝑋𝑋2 = ∑ (𝑛𝑛𝑖𝑖𝑖𝑖−𝜇𝜇𝑖𝑖𝑖𝑖)2

𝜇𝜇𝑖𝑖𝑖𝑖

• This statistic takes its minimum value of zero when all 𝑛𝑛𝑖𝑖𝑖𝑖 = 𝜇𝜇𝑖𝑖𝑖𝑖

• For a fixed sample size, greater differences {𝑛𝑛𝑖𝑖𝑖𝑖 − 𝜇𝜇𝑖𝑖𝑖𝑖} produce larger 𝑋𝑋2 values and stronger evidence against H0

16


• The 𝑋𝑋2 statistic has approximately a chi-squared distribution, for large n.

17


• The chi-squared approximation improves as {𝜇𝜇𝑖𝑖𝑖𝑖} increase, and {𝜇𝜇𝑖𝑖𝑖𝑖 ≥5} is usually sufficient

• The chi-squared dist. is concentrated over nonnegative values.

• It has mean equal to its degrees of freedom(df), and its standard deviation equals (𝑜𝑜𝑜𝑑𝑑)

• The distribution is skewed to the right, but it becomes more bell-shaped(normal) as df increases.

• the df value equals the difference between the number of parameters in the alternative hypothesis and in the null hypothesis.

18

Likelihood-Ratio Statistic • likelihood function: the probability of the data, viewed

as a function of the parameter once the data are observed

• The likelihood-ratio method for significance tests test statistics uses the ratio of the maximized likelihoods :

−𝑜 log𝑚𝑚𝑅𝑅𝑚𝑚𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 𝑅𝑅𝑅𝑅𝑟𝑟𝑅𝑅𝑅𝑅𝑅𝑅𝑙𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤𝑙𝑅𝑅𝑛𝑛 𝑝𝑝𝑅𝑅𝑟𝑟𝑅𝑅𝑚𝑚𝑅𝑅𝑅𝑅𝑅𝑅𝑟𝑟𝑜𝑜 𝑜𝑜𝑅𝑅𝑅𝑅𝑅𝑅𝑜𝑜𝑑𝑑𝑠𝑠 𝐻𝐻0

𝑚𝑚𝑅𝑅𝑚𝑚𝑅𝑅𝑚𝑚𝑚𝑚𝑚𝑚 𝑅𝑅𝑅𝑅𝑟𝑟𝑅𝑅𝑅𝑅𝑅𝑅𝑙𝑜𝑜𝑜𝑜𝑜𝑜 𝑤𝑤𝑙𝑅𝑅𝑛𝑛 𝑝𝑝𝑅𝑅𝑟𝑟𝑅𝑅𝑚𝑚𝑅𝑅𝑅𝑅𝑅𝑅𝑟𝑟𝑜𝑜 𝑅𝑅𝑟𝑟𝑅𝑅 𝑚𝑚𝑛𝑛𝑟𝑟𝑅𝑅𝑜𝑜𝑅𝑅𝑟𝑟𝑅𝑅𝑢𝑢𝑅𝑅𝑅𝑅𝑜𝑜

• For two-way contingency tables with the multinomial dist., the likelihood-ratio statistic simplifies to

𝐺𝐺2 = 𝑜∑𝑛𝑛𝑖𝑖𝑖𝑖log (𝑛𝑛𝑖𝑖𝑖𝑖𝜇𝜇𝑖𝑖𝑖𝑖

)

• This statistic is called the likelihood-ratio chi-squared statistic.

19

Tests of Independence

• The null hypothesis of statistical independence is

H0 : 𝜋𝜋𝑖𝑖𝑖𝑖 = 𝜋𝜋𝑖𝑖+𝜋𝜋+𝑖𝑖 for all i and j • the expected frequency 𝜇𝜇𝑖𝑖𝑖𝑖 = 𝑛𝑛𝜋𝜋𝑖𝑖𝑖𝑖 =𝑛𝑛𝜋𝜋𝑖𝑖+𝜋𝜋+𝑖𝑖

• estimated expected frequencies �̂�𝜇𝑖𝑖𝑖𝑖 = 𝑛𝑛𝑝𝑝𝑖𝑖+𝑝𝑝+𝑖𝑖 = 𝑛𝑛

𝑛𝑛𝑖𝑖+𝑛𝑛

𝑛𝑛+𝑖𝑖𝑛𝑛

=𝑛𝑛𝑖𝑖+𝑛𝑛+𝑖𝑖𝑛𝑛

20

Tests of Independence

• For testing independence in IxJ contingency tables, the Pearson and likelihood-ratio statistics equal

• 𝑋𝑋2 = ∑ (𝑛𝑛𝑖𝑖𝑖𝑖−𝜇𝜇�𝑖𝑖𝑖𝑖)2

𝜇𝜇�𝑖𝑖𝑖𝑖,𝐺𝐺2 = 𝑜∑𝑛𝑛𝑖𝑖𝑖𝑖log (

𝑛𝑛𝑖𝑖𝑖𝑖𝜇𝜇�𝑖𝑖𝑖𝑖

)

• Their large-sample chi-squared dist. have df=(I-1)(J-1)

21

Example: Gender Gap in Political Affiliation

22

Example: Gender Gap in Political Affiliation

23

Residuals for Cells in a Contingency Table

• For the test of independence, a useful cell residual is

𝑛𝑛𝑖𝑖𝑖𝑖 − �̂�𝜇𝑖𝑖𝑖𝑖�̂�𝜇𝑖𝑖𝑖𝑖(1 − 𝑝𝑝𝑖𝑖+)(1 − 𝑝𝑝+𝑖𝑖)

• The ratio is called a standardized residual. • When H0 is true, each standardized

residual has a large-sample standard normal distribution.

24

• Positive residuals for female Democrats and male Republicans more female Democrats and male Republicans than the hypothesis of independence predicts

Residuals for Cells in a Contingency Table

25

Partitioning Chi-Squared

• One chi-squared statistic with df1 + a separate, independent, chi-squared statistic with df2 = a chi-squared distribution with df1+df2 – For example, suppose we have two 2x3

tables, then the sum of the 𝑋𝑋2 or 𝐺𝐺2 values from the two tables is a chi-squared statistic with df=2+2=4

26

Partitioning Chi-Squared

• Chi-squared statistics having df>1 can be broken into components with fewer degrees of freedom. – For testing independence in 2xJ tables,

df=(J-1) and a chi-squared statistic can partition into J-1 components

27

Comments About Chi-Squared Tests

• limitations – merely indicate the degree of evidence for

an association – require large samples – treat both classifications as nominal

28

Ch2. Contingency Tables_22.3 The Odds RatioProperties of the Odds RatioProperties of the Odds RatioProperties of the Odds Ratio Example: Odds Ratio for Aspirin Use and Heart AttacksInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosRelationship Between Odds Ratio and Relative RiskThe Odds Ratio Applies in Case-Control StudiesThe Odds Ratio Applies in Case-Control StudiesTypes of Observational Studies2.4 Chi-Squared Tests of IndependencePearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionLikelihood-Ratio StatisticTests of IndependenceTests of IndependenceExample: Gender Gap in Political AffiliationExample: Gender Gap in Political AffiliationResiduals for Cells in a Contingency TableResiduals for Cells in a Contingency TablePartitioning Chi-SquaredPartitioning Chi-SquaredComments About Chi-Squared Tests

ch2. contingency tables 2 - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung1/3.pdf–...

Documents