f2 chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 ·...

57
Chapter 11 분포와 도수분석 Chi-square dist’n & the analysis of frequencies 2017/5/23 2

Upload: others

Post on 19-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Chapter 11분포와 도수분석

Chi-square dist’n & the analysis of frequencies

2017/5/23

2

Page 2: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

10.1 분포의 수리적 특징

• 의 응용 (Usage)적합도 검정(Tests of Goodness-of-Fit)독립성 검정(Tests of Independence)동질성 검정(Tests of Homogeneity)

2

2

1

2 2

1

2

, , ~ independent (0,1)

~

. . ~ ( , )

nn

i ni

i

Z Z N

Z

Yie g Z Y N

i

의 정의 (definition)

2

Page 3: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

11.2 적합도 검정(Goodness-of- fit)

• 우리의 data가 가설상의 분포(정규분포, 이항분포, 포아슨 분포 등)와 일치하는가?

• Data = theoretical distribution (normal, binomial, Poisson, etc.) ?

H0: 정규분포를 따른다(Normally distributed) . vs H1: not H0

콜레스테롤 수치(mg/dl) 대상자 수(freq)

1-5 2

6-10 2

11-15 7

16-20 19

21-25 4

26-30 6

31-35 3

36-40 4

Page 4: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

•Ex 11.2.1(Normal dist’n)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r : 제약조건 ( )+추정하는 모수의 개수

# restrictions # parameters estimated

𝑥 = 10.53, 𝑠 = 8.62

𝑋 = 11에 해당하는 𝑧값은, 𝑧 =11−10.53

8.62= 0.05

𝑋 = 16에 해당하는 𝑧값은, 𝑧 =15−10.53

8.62= 0.63

Page 5: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r : 제약조건 ( )+추정하는 모수의 개수

# restriction # parameters estimated

1.5

계급구간Interval

표준화된 계급구간Standardized int.

상대도수의 기대치Expected relative fre.

기대도수Expected Freq.

<1 0.13 6.321~ 5.9 −1.11 0.17 7.766~10.9 −0.53 0.22 10.44

11~15.9 0.05 0.22 10.1216~20.9 0.63 0.15 7.0821~25.9 1.21 0.08 3.57

26~30.91.79 0.03 1.30

31~35.9 2.37 0.01 0.3436~40.9 2.95 0.00 0.06

≥41 >3.53 0.00 0.01

1.71

Page 6: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

계급구간 관측도수(𝑶𝒊) 기대도수(𝑬𝒊)𝑶𝒊 −𝑬𝒊

𝟐/𝑬𝒊

< 1 0 6.32 6.11

1-5 2 7.76 2.76

6-10 2 10.44 6.50

11-15 7 10.12 1.14

16-20 19 7.08 16.08

21-25 4 3.57 0.01

26-30 6 1.30

74.5431-35 3 0.3436-40 4 0.06

> 40 0 0.01Total 47 47 107.14

13 1.71

𝛸2 = 𝑖=1𝑘 (𝑂𝑖−𝐸𝑖)

2

𝐸𝑖=107.14 > qchisq(0.05,4,lower=F)=9.49

Page 7: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

• Reject (normally distributed) :

-> Not normally distributed.

• 기대도수가 충분히 커야 ( >10)근사값이 좋음. <5인 경우 cell을 합쳐서 10보다 크게다시 범주화 시켜야 한다.

• Chi-square approximation is valid when expected freq is large enough ( >10). When <5, we can re-categorize the levels to have enough cell sizes.

• 모수가 알려진 경우, 자유도에 영향을 준다. If the parameters are known, df is different.

0H

iE

iE

iE

iE

Page 8: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

EX 11.2.2 이항분포 (binomial dist’n)100명의 의사들이 각각 25명으로 이루어진 환자 표본 추출. 환자들에게 신약과 구약 중 선호하는 진통제가

무엇인지 조사. 데이터가 이항분포를 따른다고 할 수 있는지 유의수준 0.005에서 검정하라. ( decide whether data ~ binomial dist or not)

H0: 자료는 이항분포를 따른다. (적합도검정)

Ho: Data from binomial distribution.

Number of Patients

Out of 25 Preferring

New Pain Reliever

Number of Doctors

Reporting this Number

Total Number of

Patients Preferring New

Pain Reliever by Doctor

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 or more 0 0

Total 100 500

TABLE 11.2.4

Page 9: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

이항분포의 가정하에서 기대도수=기대상대도수*총합

Expected freq under binomial dist’n=prob*total

2525

( ) (1 ) , 0,1,2, ,25

ˆ 500 / 2500 0.2

x xP X x p p xx

p

Page 10: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

2 22 2

10 2

(11 2.74) (0 1.73)47.624

2.74 1.73

유의하므로 이항분포의 귀무가설을 기각한다.

Significant -> reject H0 (data~binomial dist’n)

Number of Patients Out

of 25 Preferring New

Pain Reliever

Expected

Relative

Frequency

0 5 0.0038 0.38

1 6 0.0236 2.36

2 8 0.0708 7.08

3 10 0.1358 13.58

4 10 0.1867 18.67

5 15 0.1960 19.60

6 17 0.1633 16.33

7 10 0.1109 11.09

8 10 0.0623 6.23

9 9 0.0295 2.95

10 or more 0 0.0173 1.73

Total 100 1.0000 100.00

2.74

TABLE 12.3.5 Calculations for Example 12.3.2

11

Page 11: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Ex 11.2.3 포아슨분포 (Poisson dist’n)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson dist’n

(X ) , 0,1,2,!

xeP x x

x

=3: known

H0: 병원의 하루 응급환자의 수의 평균은 3이다.Mean # pt is 3.

Number of Emergency Admissions in a Day

Number of Days This Number of Emergency Admissions Occurred

일일 응급환자 수 날짜 수

0 51 142 153 234 165 96 37 38 19 1

10 이상 0합계 90

Page 12: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

2 22 2

9 1

(5 4.50) (2 1.08)3.664 15.557 (0.95)

4.50 1.08

응급환자수 날짜 수(𝑶𝒊) 기대 상대도수 기대도수(𝑬𝒊)𝑶𝒊 − 𝑬𝒊

𝟐

𝑬𝒊

0 5 0.050 4.50 0.056

1 14 0.149 13.41 0.026

2 15 0.224 20.16 1.321

3 23 0.224 20.16 0.4

4 16 0.168 15.12 0.051

5 9 0.101 9.09 0.001

6 3 0.050 4.50 0.5

7 3 0.022 1.98 0.525

8 1 0.008 0.72

0.7849 1 0.003 0.27

10 이상 0 0.001 0.09

합계 90 1.000 90.00 3.664

1.082

Do not reject Ho=> 𝜆 = 3 인 포아송 분포가 아니라는 증거는 없다.

Page 13: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Ex 11.2.4 균등분포 (Uniform dist’n)

균등분포의 가정 하 기대도수 =200/5=40

Expected relative freq under uniform dist’n=200/5=40

𝛸2 = 𝑂𝑖−𝐸𝑖

2

𝐸𝑖=97.15 > 13.277 = qchisq(0.01,4,lower=F)

𝑝-value=pchisq(97.15,4,lower=F)=3.98 × 10−20

기간 독감 환자수

2005.12 62

2006.01 84

2006.02 17

2006.03 16

2006.04 21

합계 200

Page 14: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

11.3 독립성검정Tests of independence

• 분할표(contingency table)

Page 15: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

. . ..1, 2, 3 1, 2, 3, 4

. . 2.. .. ..

. .

..

..2

2 2

( 1)( 1)# #

~ ,

j i jii j

ij i j

i j

ij ij

ij ij

r c ij ijrow coli j ij

N N NN

N N N

N NE N

N

O EO N

E

0H : two variables are independent.

Page 16: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

trt

Relapse

TotalYes No

A 294 (77.26) 921 (1137.74) 1215

B 98 (188.21)2862

(2771.79)2960

C 50 (198.00)3064

(2916.00)3114

D 203 (181.53)2652

(2673.47)2855

Total 645 9499 10144

EX 11.3.1

TABLE 12.4.4 Observed and Expected Frequencies for Example 12.4.1

22 2

2

2

(4 1)(2 1)# #

(294 77.26) (2652 2673.47)... 800

294 2652

~ ,

ij ij

i j ij

ij ijrow col

O E

E

O N

p-value~0: not independent at all -> strongly associated

Page 17: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

> data<-as.table(cbind(c(294,98,50,203),c(921,2862,3064,2652)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Y","N"))> chisq.test(data)

Pearson's Chi-squared testdata: dataX-squared = 816.4101, df = 3, p-value < 2.2e-16

Page 18: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

data re;input trt $ re $ count @@;cards;A Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652;proc freq data=re;weight count;tables trt*re/measures chisq;run;

Page 19: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

• 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20%를 넘지 않으며, 최소기대치가 1이상이면 무관하다. (If min >1 and cells <5 are less than 20% then not a problem)

• 2Ⅹ2 분할표 (table)n<20 or 20<n<49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라!!

-test is not valid if n<20 or (20<n<49) and expected freq of one or more cells < 5.

• Yates adjustment (보정) : 꼭 읽어보자!! Read !!

2

Page 20: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

2Ⅹ2 분할표 (table)

𝛸2 =𝑛(𝑎𝑑−𝑏𝑐)2

(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)

𝛸2 =233 131 36 − 52 14 2

145 88 183 50= 31.7391

-> Reject Ho:Independence

-> They are associated.

Yates’ correction

𝛸corrected2 =

𝑛( 𝑎𝑑−𝑏𝑐 −0.5𝑛)2

(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)

𝛸2 =233 131 36 − 52 14 −0.5 233 2

(145)(88)(183)(50)= 29.9118

smoking음주 경험 여부

(drinking)

Yes No total

흡연 경험 (Y) 131 52 183

흡연

무경험자(N)14 36 50

total 145 88 233

Page 21: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

11.4 동질성 검정 (homogeneity test)

• 동질성 검정: 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가?

• Homogeneity test: Are two samples selected from one population?

• 독립성 검정 : 한 모집단에서 표본 추출, 행과 열의 합계는 조절이 아니고 우연히 나타난다.

• Independent test : selected from a population. Marginal totals are randomly determined.

• 독립성 검정 v.s. 동질성 검정

• Independent test vs. homogeneity test

Page 22: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

• 𝑯𝟎 : 18세 이전에 조현병이 발병한 환자들과 18세 이후에 조현병이 발병한 환자들의 가족력은 동일하다.

• (Two groups have same distribution. )

• Cannot reject Ho -> Same population!

가족력 (family history) 18세 이전 18세 이후 합계

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

> data<-as.table(cbind(c(28,19,41,53),c(35,38,44,60)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Early","Later"))> chisq.test(data)

Pearson's Chi-squared testdata: dataX-squared = 3.6216, df = 3, p-value = 0.3053

Page 23: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

① test2

2 22

0

( ) 220(60.72 40.48)

( )( )( )( ) 108 112 100 120

8.7302 3.841

n ad bc

a c b d a b c d

H∴Reject

•2Ⅹ2 table

∴probabilities of having the disease for two groups are significantly different.

Page 24: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 .60 120 0.40

.60 100 .40 1200.4909

100 120

0.60 0.402.95469 1.96 significant

.4903 .5091 .4903 .5091

100 120

: :

ˆ ( )

(1 ) (1 )

. .

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p̂

2p̂

Page 25: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

•debatea.sas* File : debatea.sas ;

options ls=70 ps=55 nodate

nonumber ;

data one;

input id school gender compare

argue research reason speak ;

if school in (3,5,6,8) ;

label id='Survey Number'

school='High School'

compare='How Debate

Compares to OthersClasses'

argue='Argumentation'

research='Research'

reason='Reasoning'

speak='Speaking' ;

cards;

1 6 1 1 1 1 1 1

108 7 1 1 1 1 1 2

56 3 1 1 1 1 1 1

,,,생략

70 6 1 1 1 1 1 1

69 6 2 1 1 1 1 1

;

run;

proc freq data=one;

tables school*compare/chisq

expected ;

title 'Comparing Schools in the

Debate Survey';

run;

proc freq data=one;

tables school*compare/exact ;

title 'Comparing Schools in the

Debate Survey';

run;

Page 26: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

data respire;

input treat $ outcome $ count ;

cards;

test f 40

test u 20

placebo f 16

placebo u 48;

proc freq;

weight count;

tables treat*outcome/chisq;

run;

Page 27: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

SAS 시스템

FREQ 프로시저

treat * outcome 교차표

treat outcome

빈도|백분율|

행 백분율|칼럼 백분율|f |u | 총합-----------+--------+--------+placebo | 16 | 48 | 64

| 12.90 | 38.71 | 51.61| 25.00 | 75.00 || 28.57 | 70.59 |

-----------+--------+--------+test | 40 | 20 | 60

| 32.26 | 16.13 | 48.39| 66.67 | 33.33 || 71.43 | 29.41 |

-----------+--------+--------+총합 56 68 124

45.16 54.84 100.00

Page 28: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

treat * outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------

카이제곱 1 21.7087 <.0001

우도비 카이제곱 1 22.3768 <.0001

연속성 수정 카이제곱 1 20.0589 <.0001

Mantel-Haenszel 카이제곱 1 21.5336 <.0001

파이 계수 -0.4184

분할 계수 0.3860

크래머의 V -0.4184

Fisher의 정확 검정----------------------------

(1,1) 셀 빈도(F) 16

하단측 p값 Pr <= F 2.838E-06

상단측 p값 Pr >= F 1.0000

테이블 확률 (P) 2.397E-06

양측 p값 Pr <= P 4.754E-06

표본 크기 = 124

Page 29: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

data severe;

input treat $ outcome $ count ;

cards;

Test f 10

Test u 2

Control f 2

Control u 4

;

proc freq order=data;

tables treat*outcome / chisq nocol;

weight count;

run;

11.5 Fisher의 정확 검정

Page 30: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

SAS 시스템

FREQ 프로시저

treat * outcome 교차표

treat outcome

빈도|

백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 55.56 | 11.11 | 66.67

| 83.33 | 16.67 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 11.11 | 22.22 | 33.33

| 33.33 | 66.67 |

-----------+--------+--------+

총합 12 6 18

66.67 33.33 100.00

Page 31: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

treat * outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 4.5000 0.0339우도비 카이제곱 1 4.4629 0.0346연속성 수정 카이제곱 1 2.5313 0.1116Mantel-Haenszel 카이제곱 1 4.2500 0.0393파이 계수 0.5000분할 계수 0.4472크래머의 V 0.5000

경고: 셀들의 75%가 5보다 작은 기대도수를 가지고 있습니다.카이제곱 검정은 올바르지 않을 수 있습니다.

Fisher의 정확 검정----------------------------(1,1) 셀 빈도(F) 10하단측 p값 Pr <= F 0.9961상단측 p값 Pr >= F 0.0573

테이블 확률 (P) 0.0533양측 p값 Pr <= P 0.1070

표본 크기 = 18

Page 32: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Exact Test

Table Cell

(1,1) (1,2) (2,1) (2,2) probabilities

12 0 0 6 .0001

11 1 1 5 .0039

10 2 2 4 .0533

9 3 3 3 .2370

8 4 4 2 .4000

7 5 5 1 .2560

6 6 6 0 .0498

Page 33: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Table Probabilities

• One-tailed p-value

• Two-tailed p-value

0.0533 0.0039 0.0001 0.0573p

0.0533 0.0039 0.0001 0.0498 0.1071p

H0: 두 변수는 서로 독립(동질)이다. vs H1: not H0

Page 34: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

> fisher.test(matrix(c(7,3,5,6),2,2),alternative='greater')

Fisher's Exact Test for Count Data

data: matrix(c(7, 3, 5, 6), 2, 2)

p-value = 0.2449

alternative hypothesis: true odds ratio is greater than 1

95 percent confidence interval:

0.4512625 Inf

sample estimates:

odds ratio

2.661251

Page 35: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

11.6 상대위험비, 오즈비, 그리고 Mantel-Haenszel 통계량Relative risk, odds ratio, and Mantel-Haenszel stat

• 전향적연구 (prospective study): 노출여부에 따라 2 집단을 뽑아서 시간에 따라 관찰 (Fix # exposed and unexposed, then follow)

• 후향적연구 (retrospective study): 결과변수에 따라 표본을 뽑고 과거를 조사(Fix # pt and controls and investigate the past)

Page 36: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

(Relative Risk)

𝑅𝑅 =𝑎

𝑎+𝑏𝑐

𝑐+𝑑

질병 유무 (Disease)

위험요인 (Risk) 있음(Y) 없음(N) 합계

노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏

노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑

합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛

data preg;input smoke $ preg $ count @@;cards;smoke early 26 smoke abnormal 380 nonsmoke early 178 nonsmoke abnormal 3386 ;proc freq order=data;weight count; tables smoke*preg/measures chisq;run;

상대위험도

Page 37: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

(Odds Ratio)

𝑶𝑹 =𝑎

𝒄𝒃

𝑑

=𝒂𝒅

𝒃𝒄

질병 유무 (Disease)

위험요인 (Risk) 있음(Y) 없음(N) 합계

노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏

노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑

합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛

Odds = p/(1-p)

Odds for exposed: =𝒂

𝒂+𝒃

𝟏−𝒂

𝒂+𝒃

=𝒂

𝒂+𝒃𝒃

𝒂+𝒃

=𝒂

𝒃

Odds for unexposed: =𝒄

𝒄+𝒅

𝟏−𝒄

𝒄+𝒅

=𝒄

𝒄+𝒅𝒅

𝒄+𝒅

=𝒄

𝒅

Odds Ration = 𝒂

𝒃𝒄

𝒅

=𝒂𝒅

𝒃𝒄

오즈비

Page 38: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

* Odds Ratio and Relative Risk

• Risk is the preferable measure because it is a probability.

• Why odds?

– OR is often a good approximation to the RR. (for a rare disease)

– Sometimes OR is either all we can estimate (case-control studies) or

– It is the most convenient to calculate (logistic regression analysis)

Page 39: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

* Sun protection during childhood and cutaneous melanoma

Sun protection? Cases Controls total

Yes 99 132 231

No 303 290 593

total 402 422 824

RR is meaningful when the samples are randomly selected. In case-control study, we have a sample stratified by case-control status.

Proportion of melanoma=402/824=0.49

Woodward (Epidemiology-study design and data analysis)

Page 40: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

f1, f2: sampling fractions of cases and controls

Woodward (Epidemiology-study design and data analysis)

(a) Population value (b)Expected values in the sample

Diseased Not Diseased

Total cases controls Total

Exposed A B A+B f1A f2B f1A+f2B

Not Exposed

C D C+D f1C f2D f1C+f2D

total A+C B+D N f1(A+C) f2(B+D) n

/( )

/( )

A A BRR

C C D

Risk factor for the exposed

Risk factor for the unexposed

1 1 2/( ) /( )f A f A f B A A B

1 1 2/( ) /( )f C f C f D C C D

Page 41: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

(a) Population value (b)Expected values in the sample

Diseased Not Diseased

Total cases controls Total

Exposed A B A+B f1A f2B f1A+f2B

Not Exposed

C D C+D f1C f2D f1C+f2D

total A+C B+D N f1(A+C) f2(B+D) n

/( )

/( )

A A BRR

C C D

RR (b)

1 1 2

1 1 2

( ) ( )

( ) ( )

f A f C f D A C D

f C f A f B C A B

RR (a)

Page 42: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

(a) Population value (b)Expected values in the sample

Diseased Not Diseased

Total cases controls Total

Exposed A B A+B f1A f2B f1A+f2B

Not Exposed

C D C+D f1C f2D f1C+f2D

total A+C B+D N f1(A+C) f2(B+D) n

1 2

2 1

( )( )

( )( )

f A f D ADOR

f B f C BC

Odds for the exposed

Odds for the unexposed

1 2/ /f A f B A B

1 2/ /f C f D C D

But

We can use a case-control study to estimate the OR, but not risk, RR, nor odds.

Page 43: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

http://sphweb.bumc.bu.edu/otlt/MPH-

Modules/EP/EP713_Association/EP713_Association8.html

• Measures of associations (Boston University)

Page 44: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

[Mantel-Haeszel 통계량]교란변수의 효과를 보정한 후의 연관관계

association after adjusting for a confounding factor

• 여러 개의 표에서 기대도수와 분산을 계산하여 합친다. (calculate expected values and variances -> and then combine them to have combined effect)

• 합치기 전에 homogeneity test를 하여 합치는 것이 합리적인지 판단할 필요가 있다. (homogeneity test is recommended prior to the calculation of the combined effect)

Page 45: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Sets of 2*2 tablesdata ca;

input gender $ ECG $ disease $ count;

datalines;

female <0.1 yes 4

female <0.1 no 11

female >=0.1 yes 8

female >=0.1 no 10

male <0.1 yes 9

male <0.1 no 9

male >=0.1 yes 21

male >=0.1 no 6

;

options ls=75 nonumber nodate;

proc freq;

weight count ;

tables gender*disease /nocol nopct chisq ;

tables gender*ECG*disease /nocol nopct cmh chisq measures;

run;

Page 46: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

The FREQ Procedure

Table of gender by disease

gender disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

female | 21 | 12 | 33

| 63.64 | 36.36 |

---------+--------+--------+

male | 15 | 30 | 45

| 33.33 | 66.67 |

---------+--------+--------+

Total 36 42 78

Statistics for Table of gender by disease

Statistic DF Value Prob

------------------------------------------------------

Chi-Square 1 7.0346 0.0080

Likelihood Ratio Chi-Square 1 7.1209 0.0076

Continuity Adj. Chi-Square 1 5.8681 0.0154

Mantel-Haenszel Chi-Square 1 6.9444 0.0084

Phi Coefficient 0.3003

Contingency Coefficient 0.2876

Cramer's V 0.3003

Page 47: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Table 1 of ECG by disease

Controlling for gender=female

ECG disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

<0.1 | 11 | 4 | 15

| 73.33 | 26.67 |

---------+--------+--------+

>=0.1 | 10 | 8 | 18

| 55.56 | 44.44 |

---------+--------+--------+

Total 21 12 33

Statistics for Table 1 of ECG by disease

Controlling for gender=female

Statistic DF Value Prob

------------------------------------------------------

Chi-Square 1 1.1175 0.2905

Likelihood Ratio Chi-Square 1 1.1337 0.2870

Continuity Adj. Chi-Square 1 0.4813 0.4879

Mantel-Haenszel Chi-Square 1 1.0836 0.2979

Phi Coefficient 0.1840

Contingency Coefficient 0.1810

Cramer's V 0.1840

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 2.2000 0.5036 9.6107Cohort (Col1 Risk) 1.3200 0.7897 2.2063Cohort (Col2 Risk) 0.6000 0.2240 1.6073

Sample Size = 33

Page 48: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Table 2 of ECG by disease

Controlling for gender=male

ECG disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

<0.1 | 9 | 9 | 18

| 50.00 | 50.00 |

---------+--------+--------+

>=0.1 | 6 | 21 | 27

| 22.22 | 77.78 |

---------+--------+--------+

Total 15 30 45

Statistics for Table 2 of ECG by disease

Controlling for gender=male

Statistic DF Value Prob

------------------------------------------------------

Chi-Square 1 3.7500 0.0528

Likelihood Ratio Chi-Square 1 3.7288 0.0535

Continuity Adj. Chi-Square 1 2.6042 0.1066

Mantel-Haenszel Chi-Square 1 3.6667 0.0555

Phi Coefficient 0.2887

Contingency Coefficient 0.2774

Cramer's V 0.2887

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 3.5000 0.9587 12.7775Cohort (Col1 Risk) 2.2500 0.9680 5.2298Cohort (Col2 Risk) 0.6429 0.3883 1.0642

Sample Size = 45

Page 49: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Summary Statistics for ECG by disease

Controlling for gender

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

---------------------------------------------------------------

1 Nonzero Correlation 1 4.5026 0.0338

2 Row Mean Scores Differ 1 4.5026 0.0338

3 General Association 1 4.5026 0.0338

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence Limits

-------------------------------------------------------------------------

Case-Control Mantel-Haenszel 2.8467 1.0765 7.5279

(Odds Ratio) Logit 2.8593 1.0807 7.5650

Cohort Mantel-Haenszel 1.6414 1.0410 2.5879

(Col1 Risk) Logit 1.5249 0.9833 2.3647

Cohort Mantel-Haenszel 0.6299 0.3980 0.9969

(Col2 Risk) Logit 0.6337 0.4046 0.9926

Breslow-Day Test forHomogeneity of the Odds Ratios------------------------------Chi-Square 0.2155DF 1Pr > ChiSq 0.6425

Total Sample Size = 78

OR’s are homogeneous, so CMH estimates of common effect is meaningful.Otherwise, it’s not.

Page 50: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

data ca;

input gender $ ECG $ disease $ count;

datalines;

female <0.1 yes 8

female <0.1 no 11

female >=0.1 yes 8

female >=0.1 no 20

male <0.1 yes 9

male <0.1 no 9

male >=0.1 yes 30

male >=0.1 no 6

;

options ls=75 nonumber nodate;

proc freq;

weight count ;

tables gender*ECG*disease /nocol nopct cmh measures;

run;

Page 51: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Table 1 of ECG by disease

Controlling for gender=female

ECG disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

<0.1 | 11 | 8 | 19

| 57.89 | 42.11 |

---------+--------+--------+

>=0.1 | 20 | 8 | 28

| 71.43 | 28.57 |

---------+--------+--------+

Total 31 16 47

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits

-----------------------------------------------------------------

Case-Control (Odds Ratio) 0.5500 0.1615 1.8731

Cohort (Col1 Risk) 0.8105 0.5171 1.2703

Cohort (Col2 Risk) 1.4737 0.6701 3.2407

Sample Size = 47

Page 52: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Table 2 of ECG by disease

Controlling for gender=male

ECG disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

<0.1 | 9 | 9 | 18

| 50.00 | 50.00 |

---------+--------+--------+

>=0.1 | 6 | 30 | 36

| 16.67 | 83.33 |

---------+--------+--------+

Total 15 39 54

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits

-----------------------------------------------------------------

Case-Control (Odds Ratio) 5.0000 1.3992 17.8677

Cohort (Col1 Risk) 3.0000 1.2641 7.1198

Cohort (Col2 Risk) 0.6000 0.3696 0.9740

Sample Size = 54

Page 53: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

Summary Statistics for ECG by disease

Controlling for gender

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

---------------------------------------------------------------

1 Nonzero Correlation 1 1.2063 0.2721

2 Row Mean Scores Differ 1 1.2063 0.2721

3 General Association 1 1.2063 0.2721

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence Limits

-------------------------------------------------------------------------

Case-Control Mantel-Haenszel 1.5604 0.6767 3.5982

(Odds Ratio) Logit 1.5893 0.6572 3.8433

Cohort Mantel-Haenszel 1.2447 0.8393 1.8460

(Col1 Risk) Logit 1.0708 0.7187 1.5954

Cohort Mantel-Haenszel 0.8135 0.5412 1.2227

(Col2 Risk) Logit 0.7677 0.5081 1.1600

Breslow-Day Test for

Homogeneity of the Odds Ratios

------------------------------

Chi-Square 6.2128

DF 1

Pr > ChiSq 0.0127

Page 54: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

* McNemar Test : Matched pairs

Page 55: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

data one;

input hus_resp $ wif_resp $ no ;

datalines;

yes yes 20

yes no 5

no yes 10

no no 10

;run;

proc freq ;

tables hus_resp*wif_resp / agree ;

weight no ;

run;

“Ho : husband and wife 의 approval rates는 같다”를 기각하지 못함.

We do not reject “Ho : approval rates of husband and wife are the same”.

Page 56: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95% 신뢰수준에서 기각한다.

Kappa=1 >> perfect agreement, Kappa > 0.8 >> excellent agreement Kappa > 0.4 >> moderate agreement

CI does not include 0. -> we reject the null hypo. of K=0 by 95% confidence level.

Page 57: F2 Chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 · •작은기대도수(small expected freq) 기대치5미만의cell수가전체20%를넘지않

# Chisq test by R filename: chisq.r

data <- matrix(c(25, 5, 15, 15), ncol=2, byrow=T)

data

data2 <- matrix(c(16, 11, 3, 21, 8, 1), ncol=2, byrow=T)

data2

chisq.test(data)

chisq.test(data2)

fisher.test(data2)

data <- matrix(c(6, 2, 8, 4), ncol=2, byrow=T)

data

mcnemar.test(data)

## From Agresti(2007) p.39

M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))

dimnames(M) <- list(gender=c("M","F"),

party=c("Democrat","Independent", "Republican"))

M

colSums(M)

rowSums(M)

cbind(M,rowSums(M))

rbind(M,colSums(M))

prop.table(M, margin=2)*100

prop.table(M, margin=1)*100

prop.table(M, margin=2)*100

prop.table(M, margin=1)*100

(Xsq <- chisq.test(M)) # Prints test summary

Xsq$observed # observed counts (same as M)

Xsq$expected # expected counts under the null

Xsq$residuals # Pearson residuals

sum((Xsq$residuals)**2)

1-pchisq(sum((Xsq$residuals)**2),

(ncol(M)-1)*(nrow(M)-1))