f2 chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 ·...
TRANSCRIPT
Chapter 11분포와 도수분석
Chi-square dist’n & the analysis of frequencies
2017/5/23
2
10.1 분포의 수리적 특징
• 의 응용 (Usage)적합도 검정(Tests of Goodness-of-Fit)독립성 검정(Tests of Independence)동질성 검정(Tests of Homogeneity)
2
2
1
2 2
1
2
, , ~ independent (0,1)
~
. . ~ ( , )
nn
i ni
i
Z Z N
Z
Yie g Z Y N
i
의 정의 (definition)
2
11.2 적합도 검정(Goodness-of- fit)
• 우리의 data가 가설상의 분포(정규분포, 이항분포, 포아슨 분포 등)와 일치하는가?
• Data = theoretical distribution (normal, binomial, Poisson, etc.) ?
H0: 정규분포를 따른다(Normally distributed) . vs H1: not H0
콜레스테롤 수치(mg/dl) 대상자 수(freq)
1-5 2
6-10 2
11-15 7
16-20 19
21-25 4
26-30 6
31-35 3
36-40 4
•Ex 11.2.1(Normal dist’n)
2
2 2
1
~k
i i
k ri i
O E
E
Oi E
i O Ei i
관측치(observed) 기대치(expected)
r : 제약조건 ( )+추정하는 모수의 개수
# restrictions # parameters estimated
𝑥 = 10.53, 𝑠 = 8.62
𝑋 = 11에 해당하는 𝑧값은, 𝑧 =11−10.53
8.62= 0.05
𝑋 = 16에 해당하는 𝑧값은, 𝑧 =15−10.53
8.62= 0.63
2
2 2
1
~k
i i
k ri i
O E
E
Oi E
i O Ei i
관측치(observed) 기대치(expected)
r : 제약조건 ( )+추정하는 모수의 개수
# restriction # parameters estimated
1.5
계급구간Interval
표준화된 계급구간Standardized int.
상대도수의 기대치Expected relative fre.
기대도수Expected Freq.
<1 0.13 6.321~ 5.9 −1.11 0.17 7.766~10.9 −0.53 0.22 10.44
11~15.9 0.05 0.22 10.1216~20.9 0.63 0.15 7.0821~25.9 1.21 0.08 3.57
26~30.91.79 0.03 1.30
31~35.9 2.37 0.01 0.3436~40.9 2.95 0.00 0.06
≥41 >3.53 0.00 0.01
1.71
계급구간 관측도수(𝑶𝒊) 기대도수(𝑬𝒊)𝑶𝒊 −𝑬𝒊
𝟐/𝑬𝒊
< 1 0 6.32 6.11
1-5 2 7.76 2.76
6-10 2 10.44 6.50
11-15 7 10.12 1.14
16-20 19 7.08 16.08
21-25 4 3.57 0.01
26-30 6 1.30
74.5431-35 3 0.3436-40 4 0.06
> 40 0 0.01Total 47 47 107.14
13 1.71
𝛸2 = 𝑖=1𝑘 (𝑂𝑖−𝐸𝑖)
2
𝐸𝑖=107.14 > qchisq(0.05,4,lower=F)=9.49
• Reject (normally distributed) :
-> Not normally distributed.
• 기대도수가 충분히 커야 ( >10)근사값이 좋음. <5인 경우 cell을 합쳐서 10보다 크게다시 범주화 시켜야 한다.
• Chi-square approximation is valid when expected freq is large enough ( >10). When <5, we can re-categorize the levels to have enough cell sizes.
• 모수가 알려진 경우, 자유도에 영향을 준다. If the parameters are known, df is different.
0H
iE
iE
iE
iE
EX 11.2.2 이항분포 (binomial dist’n)100명의 의사들이 각각 25명으로 이루어진 환자 표본 추출. 환자들에게 신약과 구약 중 선호하는 진통제가
무엇인지 조사. 데이터가 이항분포를 따른다고 할 수 있는지 유의수준 0.005에서 검정하라. ( decide whether data ~ binomial dist or not)
H0: 자료는 이항분포를 따른다. (적합도검정)
Ho: Data from binomial distribution.
Number of Patients
Out of 25 Preferring
New Pain Reliever
Number of Doctors
Reporting this Number
Total Number of
Patients Preferring New
Pain Reliever by Doctor
0 5 0
1 6 6
2 8 16
3 10 30
4 10 40
5 15 75
6 17 102
7 10 70
8 10 80
9 9 81
10 or more 0 0
Total 100 500
TABLE 11.2.4
이항분포의 가정하에서 기대도수=기대상대도수*총합
Expected freq under binomial dist’n=prob*total
2525
( ) (1 ) , 0,1,2, ,25
ˆ 500 / 2500 0.2
x xP X x p p xx
p
2 22 2
10 2
(11 2.74) (0 1.73)47.624
2.74 1.73
유의하므로 이항분포의 귀무가설을 기각한다.
Significant -> reject H0 (data~binomial dist’n)
Number of Patients Out
of 25 Preferring New
Pain Reliever
Expected
Relative
Frequency
0 5 0.0038 0.38
1 6 0.0236 2.36
2 8 0.0708 7.08
3 10 0.1358 13.58
4 10 0.1867 18.67
5 15 0.1960 19.60
6 17 0.1633 16.33
7 10 0.1109 11.09
8 10 0.0623 6.23
9 9 0.0295 2.95
10 or more 0 0.0173 1.73
Total 100 1.0000 100.00
2.74
TABLE 12.3.5 Calculations for Example 12.3.2
11
Ex 11.2.3 포아슨분포 (Poisson dist’n)
포아슨분포의 가정 하에서 상대도수의 기대치
Expected relative freq under Poisson dist’n
(X ) , 0,1,2,!
xeP x x
x
=3: known
H0: 병원의 하루 응급환자의 수의 평균은 3이다.Mean # pt is 3.
Number of Emergency Admissions in a Day
Number of Days This Number of Emergency Admissions Occurred
일일 응급환자 수 날짜 수
0 51 142 153 234 165 96 37 38 19 1
10 이상 0합계 90
2 22 2
9 1
(5 4.50) (2 1.08)3.664 15.557 (0.95)
4.50 1.08
응급환자수 날짜 수(𝑶𝒊) 기대 상대도수 기대도수(𝑬𝒊)𝑶𝒊 − 𝑬𝒊
𝟐
𝑬𝒊
0 5 0.050 4.50 0.056
1 14 0.149 13.41 0.026
2 15 0.224 20.16 1.321
3 23 0.224 20.16 0.4
4 16 0.168 15.12 0.051
5 9 0.101 9.09 0.001
6 3 0.050 4.50 0.5
7 3 0.022 1.98 0.525
8 1 0.008 0.72
0.7849 1 0.003 0.27
10 이상 0 0.001 0.09
합계 90 1.000 90.00 3.664
1.082
Do not reject Ho=> 𝜆 = 3 인 포아송 분포가 아니라는 증거는 없다.
Ex 11.2.4 균등분포 (Uniform dist’n)
균등분포의 가정 하 기대도수 =200/5=40
Expected relative freq under uniform dist’n=200/5=40
𝛸2 = 𝑂𝑖−𝐸𝑖
2
𝐸𝑖=97.15 > 13.277 = qchisq(0.01,4,lower=F)
𝑝-value=pchisq(97.15,4,lower=F)=3.98 × 10−20
기간 독감 환자수
2005.12 62
2006.01 84
2006.02 17
2006.03 16
2006.04 21
합계 200
11.3 독립성검정Tests of independence
• 분할표(contingency table)
. . ..1, 2, 3 1, 2, 3, 4
. . 2.. .. ..
. .
..
..2
2 2
( 1)( 1)# #
~ ,
j i jii j
ij i j
i j
ij ij
ij ij
r c ij ijrow coli j ij
N N NN
N N N
N NE N
N
O EO N
E
0H : two variables are independent.
trt
Relapse
TotalYes No
A 294 (77.26) 921 (1137.74) 1215
B 98 (188.21)2862
(2771.79)2960
C 50 (198.00)3064
(2916.00)3114
D 203 (181.53)2652
(2673.47)2855
Total 645 9499 10144
EX 11.3.1
TABLE 12.4.4 Observed and Expected Frequencies for Example 12.4.1
22 2
2
2
(4 1)(2 1)# #
(294 77.26) (2652 2673.47)... 800
294 2652
~ ,
ij ij
i j ij
ij ijrow col
O E
E
O N
p-value~0: not independent at all -> strongly associated
> data<-as.table(cbind(c(294,98,50,203),c(921,2862,3064,2652)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Y","N"))> chisq.test(data)
Pearson's Chi-squared testdata: dataX-squared = 816.4101, df = 3, p-value < 2.2e-16
data re;input trt $ re $ count @@;cards;A Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652;proc freq data=re;weight count;tables trt*re/measures chisq;run;
• 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20%를 넘지 않으며, 최소기대치가 1이상이면 무관하다. (If min >1 and cells <5 are less than 20% then not a problem)
• 2Ⅹ2 분할표 (table)n<20 or 20<n<49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라!!
-test is not valid if n<20 or (20<n<49) and expected freq of one or more cells < 5.
• Yates adjustment (보정) : 꼭 읽어보자!! Read !!
2
2Ⅹ2 분할표 (table)
𝛸2 =𝑛(𝑎𝑑−𝑏𝑐)2
(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)
𝛸2 =233 131 36 − 52 14 2
145 88 183 50= 31.7391
-> Reject Ho:Independence
-> They are associated.
Yates’ correction
𝛸corrected2 =
𝑛( 𝑎𝑑−𝑏𝑐 −0.5𝑛)2
(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)
𝛸2 =233 131 36 − 52 14 −0.5 233 2
(145)(88)(183)(50)= 29.9118
smoking음주 경험 여부
(drinking)
Yes No total
흡연 경험 (Y) 131 52 183
흡연
무경험자(N)14 36 50
total 145 88 233
11.4 동질성 검정 (homogeneity test)
• 동질성 검정: 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가?
• Homogeneity test: Are two samples selected from one population?
• 독립성 검정 : 한 모집단에서 표본 추출, 행과 열의 합계는 조절이 아니고 우연히 나타난다.
• Independent test : selected from a population. Marginal totals are randomly determined.
• 독립성 검정 v.s. 동질성 검정
• Independent test vs. homogeneity test
• 𝑯𝟎 : 18세 이전에 조현병이 발병한 환자들과 18세 이후에 조현병이 발병한 환자들의 가족력은 동일하다.
• (Two groups have same distribution. )
• Cannot reject Ho -> Same population!
가족력 (family history) 18세 이전 18세 이후 합계
A 28 35 63
B 19 38 57
C 41 44 85
D 53 60 113
합계 141 177 318
> data<-as.table(cbind(c(28,19,41,53),c(35,38,44,60)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Early","Later"))> chisq.test(data)
Pearson's Chi-squared testdata: dataX-squared = 3.6216, df = 3, p-value = 0.3053
① test2
2 22
0
( ) 220(60.72 40.48)
( )( )( )( ) 108 112 100 120
8.7302 3.841
n ad bc
a c b d a b c d
H∴Reject
•2Ⅹ2 table
∴probabilities of having the disease for two groups are significantly different.
②두 집단의 확률에 대한 비교
(Comparing two probabilities)
1 1 2
0 1 2 1 2
1 1 2
1 2
ˆ100 .60 120 0.40
.60 100 .40 1200.4909
100 120
0.60 0.402.95469 1.96 significant
.4903 .5091 .4903 .5091
100 120
: :
ˆ ( )
(1 ) (1 )
. .
a
n p n
p
Z
H p p H p p
p p pZ
p p p p
n n
e g
2p̂
2p̂
•debatea.sas* File : debatea.sas ;
options ls=70 ps=55 nodate
nonumber ;
data one;
input id school gender compare
argue research reason speak ;
if school in (3,5,6,8) ;
label id='Survey Number'
school='High School'
compare='How Debate
Compares to OthersClasses'
argue='Argumentation'
research='Research'
reason='Reasoning'
speak='Speaking' ;
cards;
1 6 1 1 1 1 1 1
108 7 1 1 1 1 1 2
56 3 1 1 1 1 1 1
,,,생략
70 6 1 1 1 1 1 1
69 6 2 1 1 1 1 1
;
run;
proc freq data=one;
tables school*compare/chisq
expected ;
title 'Comparing Schools in the
Debate Survey';
run;
proc freq data=one;
tables school*compare/exact ;
title 'Comparing Schools in the
Debate Survey';
run;
data respire;
input treat $ outcome $ count ;
cards;
test f 40
test u 20
placebo f 16
placebo u 48;
proc freq;
weight count;
tables treat*outcome/chisq;
run;
SAS 시스템
FREQ 프로시저
treat * outcome 교차표
treat outcome
빈도|백분율|
행 백분율|칼럼 백분율|f |u | 총합-----------+--------+--------+placebo | 16 | 48 | 64
| 12.90 | 38.71 | 51.61| 25.00 | 75.00 || 28.57 | 70.59 |
-----------+--------+--------+test | 40 | 20 | 60
| 32.26 | 16.13 | 48.39| 66.67 | 33.33 || 71.43 | 29.41 |
-----------+--------+--------+총합 56 68 124
45.16 54.84 100.00
treat * outcome 테이블에 대한 통계량
통계량 자유도 값 확률값----------------------------------------------------------
카이제곱 1 21.7087 <.0001
우도비 카이제곱 1 22.3768 <.0001
연속성 수정 카이제곱 1 20.0589 <.0001
Mantel-Haenszel 카이제곱 1 21.5336 <.0001
파이 계수 -0.4184
분할 계수 0.3860
크래머의 V -0.4184
Fisher의 정확 검정----------------------------
(1,1) 셀 빈도(F) 16
하단측 p값 Pr <= F 2.838E-06
상단측 p값 Pr >= F 1.0000
테이블 확률 (P) 2.397E-06
양측 p값 Pr <= P 4.754E-06
표본 크기 = 124
data severe;
input treat $ outcome $ count ;
cards;
Test f 10
Test u 2
Control f 2
Control u 4
;
proc freq order=data;
tables treat*outcome / chisq nocol;
weight count;
run;
11.5 Fisher의 정확 검정
SAS 시스템
FREQ 프로시저
treat * outcome 교차표
treat outcome
빈도|
백분율|
행 백분율|f |u | 총합-----------+--------+--------+
Test | 10 | 2 | 12
| 55.56 | 11.11 | 66.67
| 83.33 | 16.67 |
-----------+--------+--------+
Control | 2 | 4 | 6
| 11.11 | 22.22 | 33.33
| 33.33 | 66.67 |
-----------+--------+--------+
총합 12 6 18
66.67 33.33 100.00
treat * outcome 테이블에 대한 통계량
통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 4.5000 0.0339우도비 카이제곱 1 4.4629 0.0346연속성 수정 카이제곱 1 2.5313 0.1116Mantel-Haenszel 카이제곱 1 4.2500 0.0393파이 계수 0.5000분할 계수 0.4472크래머의 V 0.5000
경고: 셀들의 75%가 5보다 작은 기대도수를 가지고 있습니다.카이제곱 검정은 올바르지 않을 수 있습니다.
Fisher의 정확 검정----------------------------(1,1) 셀 빈도(F) 10하단측 p값 Pr <= F 0.9961상단측 p값 Pr >= F 0.0573
테이블 확률 (P) 0.0533양측 p값 Pr <= P 0.1070
표본 크기 = 18
Exact Test
Table Cell
(1,1) (1,2) (2,1) (2,2) probabilities
12 0 0 6 .0001
11 1 1 5 .0039
10 2 2 4 .0533
9 3 3 3 .2370
8 4 4 2 .4000
7 5 5 1 .2560
6 6 6 0 .0498
Table Probabilities
• One-tailed p-value
• Two-tailed p-value
0.0533 0.0039 0.0001 0.0573p
0.0533 0.0039 0.0001 0.0498 0.1071p
H0: 두 변수는 서로 독립(동질)이다. vs H1: not H0
> fisher.test(matrix(c(7,3,5,6),2,2),alternative='greater')
Fisher's Exact Test for Count Data
data: matrix(c(7, 3, 5, 6), 2, 2)
p-value = 0.2449
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.4512625 Inf
sample estimates:
odds ratio
2.661251
11.6 상대위험비, 오즈비, 그리고 Mantel-Haenszel 통계량Relative risk, odds ratio, and Mantel-Haenszel stat
• 전향적연구 (prospective study): 노출여부에 따라 2 집단을 뽑아서 시간에 따라 관찰 (Fix # exposed and unexposed, then follow)
• 후향적연구 (retrospective study): 결과변수에 따라 표본을 뽑고 과거를 조사(Fix # pt and controls and investigate the past)
(Relative Risk)
𝑅𝑅 =𝑎
𝑎+𝑏𝑐
𝑐+𝑑
질병 유무 (Disease)
위험요인 (Risk) 있음(Y) 없음(N) 합계
노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏
노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑
합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛
data preg;input smoke $ preg $ count @@;cards;smoke early 26 smoke abnormal 380 nonsmoke early 178 nonsmoke abnormal 3386 ;proc freq order=data;weight count; tables smoke*preg/measures chisq;run;
상대위험도
(Odds Ratio)
𝑶𝑹 =𝑎
𝒄𝒃
𝑑
=𝒂𝒅
𝒃𝒄
질병 유무 (Disease)
위험요인 (Risk) 있음(Y) 없음(N) 합계
노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏
노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑
합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛
Odds = p/(1-p)
Odds for exposed: =𝒂
𝒂+𝒃
𝟏−𝒂
𝒂+𝒃
=𝒂
𝒂+𝒃𝒃
𝒂+𝒃
=𝒂
𝒃
Odds for unexposed: =𝒄
𝒄+𝒅
𝟏−𝒄
𝒄+𝒅
=𝒄
𝒄+𝒅𝒅
𝒄+𝒅
=𝒄
𝒅
Odds Ration = 𝒂
𝒃𝒄
𝒅
=𝒂𝒅
𝒃𝒄
오즈비
* Odds Ratio and Relative Risk
• Risk is the preferable measure because it is a probability.
• Why odds?
– OR is often a good approximation to the RR. (for a rare disease)
– Sometimes OR is either all we can estimate (case-control studies) or
– It is the most convenient to calculate (logistic regression analysis)
* Sun protection during childhood and cutaneous melanoma
Sun protection? Cases Controls total
Yes 99 132 231
No 303 290 593
total 402 422 824
RR is meaningful when the samples are randomly selected. In case-control study, we have a sample stratified by case-control status.
Proportion of melanoma=402/824=0.49
Woodward (Epidemiology-study design and data analysis)
f1, f2: sampling fractions of cases and controls
Woodward (Epidemiology-study design and data analysis)
(a) Population value (b)Expected values in the sample
Diseased Not Diseased
Total cases controls Total
Exposed A B A+B f1A f2B f1A+f2B
Not Exposed
C D C+D f1C f2D f1C+f2D
total A+C B+D N f1(A+C) f2(B+D) n
/( )
/( )
A A BRR
C C D
Risk factor for the exposed
Risk factor for the unexposed
1 1 2/( ) /( )f A f A f B A A B
1 1 2/( ) /( )f C f C f D C C D
(a) Population value (b)Expected values in the sample
Diseased Not Diseased
Total cases controls Total
Exposed A B A+B f1A f2B f1A+f2B
Not Exposed
C D C+D f1C f2D f1C+f2D
total A+C B+D N f1(A+C) f2(B+D) n
/( )
/( )
A A BRR
C C D
RR (b)
1 1 2
1 1 2
( ) ( )
( ) ( )
f A f C f D A C D
f C f A f B C A B
RR (a)
(a) Population value (b)Expected values in the sample
Diseased Not Diseased
Total cases controls Total
Exposed A B A+B f1A f2B f1A+f2B
Not Exposed
C D C+D f1C f2D f1C+f2D
total A+C B+D N f1(A+C) f2(B+D) n
1 2
2 1
( )( )
( )( )
f A f D ADOR
f B f C BC
Odds for the exposed
Odds for the unexposed
1 2/ /f A f B A B
1 2/ /f C f D C D
But
We can use a case-control study to estimate the OR, but not risk, RR, nor odds.
http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/EP/EP713_Association/EP713_Association8.html
• Measures of associations (Boston University)
[Mantel-Haeszel 통계량]교란변수의 효과를 보정한 후의 연관관계
association after adjusting for a confounding factor
• 여러 개의 표에서 기대도수와 분산을 계산하여 합친다. (calculate expected values and variances -> and then combine them to have combined effect)
• 합치기 전에 homogeneity test를 하여 합치는 것이 합리적인지 판단할 필요가 있다. (homogeneity test is recommended prior to the calculation of the combined effect)
Sets of 2*2 tablesdata ca;
input gender $ ECG $ disease $ count;
datalines;
female <0.1 yes 4
female <0.1 no 11
female >=0.1 yes 8
female >=0.1 no 10
male <0.1 yes 9
male <0.1 no 9
male >=0.1 yes 21
male >=0.1 no 6
;
options ls=75 nonumber nodate;
proc freq;
weight count ;
tables gender*disease /nocol nopct chisq ;
tables gender*ECG*disease /nocol nopct cmh chisq measures;
run;
The FREQ Procedure
Table of gender by disease
gender disease
Frequency|
Row Pct |no |yes | Total
---------+--------+--------+
female | 21 | 12 | 33
| 63.64 | 36.36 |
---------+--------+--------+
male | 15 | 30 | 45
| 33.33 | 66.67 |
---------+--------+--------+
Total 36 42 78
Statistics for Table of gender by disease
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 7.0346 0.0080
Likelihood Ratio Chi-Square 1 7.1209 0.0076
Continuity Adj. Chi-Square 1 5.8681 0.0154
Mantel-Haenszel Chi-Square 1 6.9444 0.0084
Phi Coefficient 0.3003
Contingency Coefficient 0.2876
Cramer's V 0.3003
Table 1 of ECG by disease
Controlling for gender=female
ECG disease
Frequency|
Row Pct |no |yes | Total
---------+--------+--------+
<0.1 | 11 | 4 | 15
| 73.33 | 26.67 |
---------+--------+--------+
>=0.1 | 10 | 8 | 18
| 55.56 | 44.44 |
---------+--------+--------+
Total 21 12 33
Statistics for Table 1 of ECG by disease
Controlling for gender=female
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.1175 0.2905
Likelihood Ratio Chi-Square 1 1.1337 0.2870
Continuity Adj. Chi-Square 1 0.4813 0.4879
Mantel-Haenszel Chi-Square 1 1.0836 0.2979
Phi Coefficient 0.1840
Contingency Coefficient 0.1810
Cramer's V 0.1840
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 2.2000 0.5036 9.6107Cohort (Col1 Risk) 1.3200 0.7897 2.2063Cohort (Col2 Risk) 0.6000 0.2240 1.6073
Sample Size = 33
Table 2 of ECG by disease
Controlling for gender=male
ECG disease
Frequency|
Row Pct |no |yes | Total
---------+--------+--------+
<0.1 | 9 | 9 | 18
| 50.00 | 50.00 |
---------+--------+--------+
>=0.1 | 6 | 21 | 27
| 22.22 | 77.78 |
---------+--------+--------+
Total 15 30 45
Statistics for Table 2 of ECG by disease
Controlling for gender=male
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 3.7500 0.0528
Likelihood Ratio Chi-Square 1 3.7288 0.0535
Continuity Adj. Chi-Square 1 2.6042 0.1066
Mantel-Haenszel Chi-Square 1 3.6667 0.0555
Phi Coefficient 0.2887
Contingency Coefficient 0.2774
Cramer's V 0.2887
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 3.5000 0.9587 12.7775Cohort (Col1 Risk) 2.2500 0.9680 5.2298Cohort (Col2 Risk) 0.6429 0.3883 1.0642
Sample Size = 45
Summary Statistics for ECG by disease
Controlling for gender
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
---------------------------------------------------------------
1 Nonzero Correlation 1 4.5026 0.0338
2 Row Mean Scores Differ 1 4.5026 0.0338
3 General Association 1 4.5026 0.0338
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence Limits
-------------------------------------------------------------------------
Case-Control Mantel-Haenszel 2.8467 1.0765 7.5279
(Odds Ratio) Logit 2.8593 1.0807 7.5650
Cohort Mantel-Haenszel 1.6414 1.0410 2.5879
(Col1 Risk) Logit 1.5249 0.9833 2.3647
Cohort Mantel-Haenszel 0.6299 0.3980 0.9969
(Col2 Risk) Logit 0.6337 0.4046 0.9926
Breslow-Day Test forHomogeneity of the Odds Ratios------------------------------Chi-Square 0.2155DF 1Pr > ChiSq 0.6425
Total Sample Size = 78
OR’s are homogeneous, so CMH estimates of common effect is meaningful.Otherwise, it’s not.
data ca;
input gender $ ECG $ disease $ count;
datalines;
female <0.1 yes 8
female <0.1 no 11
female >=0.1 yes 8
female >=0.1 no 20
male <0.1 yes 9
male <0.1 no 9
male >=0.1 yes 30
male >=0.1 no 6
;
options ls=75 nonumber nodate;
proc freq;
weight count ;
tables gender*ECG*disease /nocol nopct cmh measures;
run;
Table 1 of ECG by disease
Controlling for gender=female
ECG disease
Frequency|
Row Pct |no |yes | Total
---------+--------+--------+
<0.1 | 11 | 8 | 19
| 57.89 | 42.11 |
---------+--------+--------+
>=0.1 | 20 | 8 | 28
| 71.43 | 28.57 |
---------+--------+--------+
Total 31 16 47
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits
-----------------------------------------------------------------
Case-Control (Odds Ratio) 0.5500 0.1615 1.8731
Cohort (Col1 Risk) 0.8105 0.5171 1.2703
Cohort (Col2 Risk) 1.4737 0.6701 3.2407
Sample Size = 47
Table 2 of ECG by disease
Controlling for gender=male
ECG disease
Frequency|
Row Pct |no |yes | Total
---------+--------+--------+
<0.1 | 9 | 9 | 18
| 50.00 | 50.00 |
---------+--------+--------+
>=0.1 | 6 | 30 | 36
| 16.67 | 83.33 |
---------+--------+--------+
Total 15 39 54
Estimates of the Relative Risk (Row1/Row2)
Type of Study Value 95% Confidence Limits
-----------------------------------------------------------------
Case-Control (Odds Ratio) 5.0000 1.3992 17.8677
Cohort (Col1 Risk) 3.0000 1.2641 7.1198
Cohort (Col2 Risk) 0.6000 0.3696 0.9740
Sample Size = 54
Summary Statistics for ECG by disease
Controlling for gender
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF Value Prob
---------------------------------------------------------------
1 Nonzero Correlation 1 1.2063 0.2721
2 Row Mean Scores Differ 1 1.2063 0.2721
3 General Association 1 1.2063 0.2721
Estimates of the Common Relative Risk (Row1/Row2)
Type of Study Method Value 95% Confidence Limits
-------------------------------------------------------------------------
Case-Control Mantel-Haenszel 1.5604 0.6767 3.5982
(Odds Ratio) Logit 1.5893 0.6572 3.8433
Cohort Mantel-Haenszel 1.2447 0.8393 1.8460
(Col1 Risk) Logit 1.0708 0.7187 1.5954
Cohort Mantel-Haenszel 0.8135 0.5412 1.2227
(Col2 Risk) Logit 0.7677 0.5081 1.1600
Breslow-Day Test for
Homogeneity of the Odds Ratios
------------------------------
Chi-Square 6.2128
DF 1
Pr > ChiSq 0.0127
* McNemar Test : Matched pairs
data one;
input hus_resp $ wif_resp $ no ;
datalines;
yes yes 20
yes no 5
no yes 10
no no 10
;run;
proc freq ;
tables hus_resp*wif_resp / agree ;
weight no ;
run;
“Ho : husband and wife 의 approval rates는 같다”를 기각하지 못함.
We do not reject “Ho : approval rates of husband and wife are the same”.
신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95% 신뢰수준에서 기각한다.
Kappa=1 >> perfect agreement, Kappa > 0.8 >> excellent agreement Kappa > 0.4 >> moderate agreement
CI does not include 0. -> we reject the null hypo. of K=0 by 95% confidence level.
# Chisq test by R filename: chisq.r
data <- matrix(c(25, 5, 15, 15), ncol=2, byrow=T)
data
data2 <- matrix(c(16, 11, 3, 21, 8, 1), ncol=2, byrow=T)
data2
chisq.test(data)
chisq.test(data2)
fisher.test(data2)
data <- matrix(c(6, 2, 8, 4), ncol=2, byrow=T)
data
mcnemar.test(data)
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))
dimnames(M) <- list(gender=c("M","F"),
party=c("Democrat","Independent", "Republican"))
M
colSums(M)
rowSums(M)
cbind(M,rowSums(M))
rbind(M,colSums(M))
prop.table(M, margin=2)*100
prop.table(M, margin=1)*100
prop.table(M, margin=2)*100
prop.table(M, margin=1)*100
(Xsq <- chisq.test(M)) # Prints test summary
Xsq$observed # observed counts (same as M)
Xsq$expected # expected counts under the null
Xsq$residuals # Pearson residuals
sum((Xsq$residuals)**2)
1-pchisq(sum((Xsq$residuals)**2),
(ncol(M)-1)*(nrow(M)-1))