ch2. contingency tables 2 - kocwcontents.kocw.net/kocw/document/2015/gachon/kimnamhyoung1/3.pdfโ...
TRANSCRIPT
-
Ch2. Contingency Tables_2
Namhyoung Kim
Dept. of Applied Statistics
Gachon University
1
-
2.3 The Odds Ratio
โข For a probability of success ๐๐, ๐๐๐๐๐๐๐๐ = ๐๐/(1 โ ๐๐) =prob. of success/prob. of failure โข The odds are nonnegative
๐๐ =๐๐๐๐๐๐๐๐
๐๐๐๐๐๐๐๐ + 1
โข In 2x2 tables, ๐๐๐๐๐๐๐๐1 = ๐๐1/(1 โ ๐๐1) and ๐๐๐๐๐๐๐๐2 =๐๐2/(1 โ ๐๐2)
โข The odds ratio ๐๐: another measure of association
๐๐ =๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐
=๐๐1/(1 โ ๐๐1)๐๐2/(1 โ ๐๐2)
2
-
Properties of the Odds Ratio
3
โข The odds ratio can equal any nonnegative number.
โข When X and Y are independent, ๐๐1 = ๐๐2 odds1=odds2 and ๐๐ = 1
โข When ๐๐ >1, the odds of success are higher in row 1 than in row 2.
โข Values of ๐๐ father from 1.0 in a given direction represent stronger association.
-
Properties of the Odds Ratio
4
โข When one value is the inverse of the other represent the same strength of association, but in opposite direction ๐๐=0.25 is equivalent to ๐๐=1/0.25=4
โข The odds ratio does not change value when the table orientation reverses โ it is unnecessary to identify one
classification as a response variable in order to estimate ๐๐ (cf. the relative risk requires this)
-
Properties of the Odds Ratio
5
โข When both variables are response variables ๐๐ = ๐๐11/๐๐12
๐๐21/๐๐22= ๐๐11๐๐22
๐๐12๐๐21
โข The odds ratio is also called the cross-product ratio.
โข The sample odds ratio
๐๐๏ฟฝ =๐๐1/(1 โ ๐๐1)๐๐2/(1 โ ๐๐2)
=๐๐11/๐๐12๐๐21/๐๐22
=๐๐11๐๐22๐๐12๐๐21
โข This is the ML estimator of ๐๐
-
Example: Odds Ratio for Aspirin Use and Heart Attacks
โข For the physicians taking placebo, the estimated odds of MI : n11/n12=189/10845=0.0174
โข For those taking aspirin : 104/10933=0.0095 โข The sample odds ratio ๏ฟฝฬ๏ฟฝ๐ =0.0174/0.0095=1.832 The estimated
odds were 83% higher for the placebo group
6
-
Inference for Odds Ratios and Log Odds Ratios
โข Unless the sample size is extremely large, the sampling distribution of the odds ratio is highly skewed. (positive skew, skewed to the right)
โข Because of this skewness, use an alternative but equivalent measure log(๐๐)
โข independence corresponds to log(๐๐)=0 โข The log odds ratio is symmetric about
zero
7
-
Inference for Odds Ratios and Log Odds Ratios
โข Its approximating normal dist. has a mean of log(๐๐) and a SE
๐๐๐๐ =1๐๐11
+1๐๐12
+1๐๐21
+1๐๐22
โข C.I. for log(๐๐) log ๐๐๏ฟฝ ยฑ ๐ง๐ง๐ผ๐ผ
2(๐๐๐๐)
โข Exponentiating endpoints of this C.I. yields one for ๐๐
8
-
Inference for Odds Ratios and Log Odds Ratios
โข For Table 2.3, log(1.832)=0.605
โข ๐๐๐๐ = 1189
+ 110933
+ 1104
+ 110845
= 0.๐๐3
โข a 95% C.I. for log๐๐ equals 0.605 ยฑ1.96(0.123) or (0.365,0.846) โข the corresponding C.I. for ๐๐ is [exp(0.365), exp(0.846)]=(1.44, 2.33)
9
-
Inference for Odds Ratios and Log Odds Ratios
โข The sample odds ratio ๐๐๏ฟฝ equals 0 or โ if any ๐๐๐๐๐๐=0, and it is undefined if both entries in a row or column are zero.
โข The slightly amended estimator
๐๐๏ฟฝ =(๐๐11 + 0.5)(๐๐22 + 0.5)(๐๐12 + 0.5)(๐๐21 + 0.5)
10
-
Relationship Between Odds Ratio and Relative Risk
โข Odds ratio= ๐๐1/(1โ๐๐1)๐๐2/(1โ๐๐2)
= ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐ ๐ ๐๐๐๐ ร (1โ๐๐2)(1โ๐๐1)
โข When ๐๐1 and ๐๐2 are both close to zero, the fraction in the last term of this expression equals approximately 1.0 odds ratio and relative risk take similar values
โข For Table 2.3, the sample odds ratio of 1.83 is similar to the sample relative risk of 1.82
โข In such a case, an odds ratio of 1.83 does mean that ๐๐1 is approximately 1.83 times ๐๐2
11
-
The Odds Ratio Applies in Case-Control Studies
โข The marginal dist. of MI is fixed by the sampling design. (each case was matched with two control patients)
โข The outcome measured for each subject is whether she was a smoker
โข The study, which uses a retrospective design to look into the past, is called a case-control study โ common in health-related applications
12
-
The Odds Ratio Applies in Case-Control Studies
โข estimate the conditional distribution of smoking status, given MI status. โ for women suffering MI, 172/262=0.656 โ for women who had not suffered MI,
173/519=0.333 โข the sample odds ratio is [0.656/(1-
0.656)]/[0.333/(1-0.333)]=(172x346)/(173x90)=3.8 โข if we expect P(Y=1|X) to be small, then the
sample odds ratio as a rough indication of the relative risk women who had ever smoked were about four times as likely to suffer MI as women who had never smoked.
13
-
Types of Observational Studies
โข retrospective design(ํํฅ์ ์ฐ๊ตฌ์ค๊ณ) โ case-control study
โข prospective design(์ ํฅ์ ์ฐ๊ตฌ์ค๊ณ) โ cohort study โ clinical trials
โข cross-sectional design(ํก๋จ์ฐ๊ตฌ์ค๊ณ)
โข Observational study โ case-control, cohort, and cross-sectional design
โข Experimental study โ a clinical trial
14
-
2.4 Chi-Squared Tests of Independence
โข Consider the null hypothesis (H0) that cell probabilities equal certain fixed value {๐๐๐๐๐๐}
โข For a sample size n with cell counts {๐๐๐๐๐๐}, the values {๐๐๐๐๐๐ = ๐๐๐๐๐๐๐๐} are expected frequencies.
โข To judge whether the data contradict H0, we compare {๐๐๐๐๐๐} to {๐๐๐๐๐๐}
โข The larger the differences {๐๐๐๐๐๐ โ ๐๐๐๐๐๐}, the stronger the evidence against H0.
15
-
Pearson Statistics and the Chi-squared Distribution
โข The Pearson chi-squared statistic for testing H0
โข ๐๐2 = โ (๐๐๐๐๐๐โ๐๐๐๐๐๐)2
๐๐๐๐๐๐
โข This statistic takes its minimum value of zero when all ๐๐๐๐๐๐ = ๐๐๐๐๐๐
โข For a fixed sample size, greater differences {๐๐๐๐๐๐ โ ๐๐๐๐๐๐} produce larger ๐๐2 values and stronger evidence against H0
16
-
Pearson Statistics and the Chi-squared Distribution
โข The ๐๐2 statistic has approximately a chi-squared distribution, for large n.
17
-
Pearson Statistics and the Chi-squared Distribution
โข The chi-squared approximation improves as {๐๐๐๐๐๐} increase, and {๐๐๐๐๐๐ โฅ5} is usually sufficient
โข The chi-squared dist. is concentrated over nonnegative values.
โข It has mean equal to its degrees of freedom(df), and its standard deviation equals (๐๐๐๐๐)
โข The distribution is skewed to the right, but it becomes more bell-shaped(normal) as df increases.
โข the df value equals the difference between the number of parameters in the alternative hypothesis and in the null hypothesis.
18
-
Likelihood-Ratio Statistic โข likelihood function: the probability of the data, viewed
as a function of the parameter once the data are observed
โข The likelihood-ratio method for significance tests test statistics uses the ratio of the maximized likelihoods :
โ๐ log๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐๐๐๐๐ ๐ค๐ค๐๐ ๐ ๐๐ ๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐๐๐ ๐ ๐ป๐ป0
๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐๐๐๐๐ ๐ค๐ค๐๐ ๐ ๐๐ ๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐๐ ๐ ๐ ๐๐๐ ๐ ๐๐๐๐๐๐๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐ ๐ข๐ข๐ ๐ ๐ ๐ ๐๐
โข For two-way contingency tables with the multinomial dist., the likelihood-ratio statistic simplifies to
๐บ๐บ2 = ๐โ๐๐๐๐๐๐log (๐๐๐๐๐๐๐๐๐๐๐๐
)
โข This statistic is called the likelihood-ratio chi-squared statistic.
19
-
Tests of Independence
โข The null hypothesis of statistical independence is
H0 : ๐๐๐๐๐๐ = ๐๐๐๐+๐๐+๐๐ for all i and j โข the expected frequency ๐๐๐๐๐๐ = ๐๐๐๐๐๐๐๐ =๐๐๐๐๐๐+๐๐+๐๐
โข estimated expected frequencies ๏ฟฝฬ๏ฟฝ๐๐๐๐๐ = ๐๐๐๐๐๐+๐๐+๐๐ = ๐๐
๐๐๐๐+๐๐
๐๐+๐๐๐๐
=๐๐๐๐+๐๐+๐๐๐๐
20
-
Tests of Independence
โข For testing independence in IxJ contingency tables, the Pearson and likelihood-ratio statistics equal
โข ๐๐2 = โ (๐๐๐๐๐๐โ๐๐๏ฟฝ๐๐๐๐)2
๐๐๏ฟฝ๐๐๐๐,๐บ๐บ2 = ๐โ๐๐๐๐๐๐log (
๐๐๐๐๐๐๐๐๏ฟฝ๐๐๐๐
)
โข Their large-sample chi-squared dist. have df=(I-1)(J-1)
21
-
Example: Gender Gap in Political Affiliation
22
-
Example: Gender Gap in Political Affiliation
23
-
Residuals for Cells in a Contingency Table
โข For the test of independence, a useful cell residual is
๐๐๐๐๐๐ โ ๏ฟฝฬ๏ฟฝ๐๐๐๐๐๏ฟฝฬ๏ฟฝ๐๐๐๐๐(1 โ ๐๐๐๐+)(1 โ ๐๐+๐๐)
โข The ratio is called a standardized residual. โข When H0 is true, each standardized
residual has a large-sample standard normal distribution.
24
-
โข Positive residuals for female Democrats and male Republicans more female Democrats and male Republicans than the hypothesis of independence predicts
Residuals for Cells in a Contingency Table
25
-
Partitioning Chi-Squared
โข One chi-squared statistic with df1 + a separate, independent, chi-squared statistic with df2 = a chi-squared distribution with df1+df2 โ For example, suppose we have two 2x3
tables, then the sum of the ๐๐2 or ๐บ๐บ2 values from the two tables is a chi-squared statistic with df=2+2=4
26
-
Partitioning Chi-Squared
โข Chi-squared statistics having df>1 can be broken into components with fewer degrees of freedom. โ For testing independence in 2xJ tables,
df=(J-1) and a chi-squared statistic can partition into J-1 components
27
-
Comments About Chi-Squared Tests
โข limitations โ merely indicate the degree of evidence for
an association โ require large samples โ treat both classifications as nominal
28
Ch2. Contingency Tables_22.3 The Odds RatioProperties of the Odds RatioProperties of the Odds RatioProperties of the Odds Ratio Example: Odds Ratio for Aspirin Use and Heart AttacksInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosRelationship Between Odds Ratio and Relative RiskThe Odds Ratio Applies in Case-Control StudiesThe Odds Ratio Applies in Case-Control StudiesTypes of Observational Studies2.4 Chi-Squared Tests of IndependencePearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionLikelihood-Ratio StatisticTests of IndependenceTests of IndependenceExample: Gender Gap in Political AffiliationExample: Gender Gap in Political AffiliationResiduals for Cells in a Contingency TableResiduals for Cells in a Contingency TablePartitioning Chi-SquaredPartitioning Chi-SquaredComments About Chi-Squared Tests