f2 chi-square dist’n - seoul national...

41
Chapter 11 분포와 도수분석 Chi-square dist’n & the analysis of frequencies 2

Upload: others

Post on 13-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

Chapter 11분포와 도수분석

Chi-square distrsquon amp the analysis of frequencies

2

111 분포의 수리적 특징

bull 의 응용 (Usage)적합도 검정(Tests of Goodness-of-Fit)독립성 검정(Tests of Independence)동질성 검정(Tests of Homogeneity)

2

2

1

2 2

1

2

~ independent (01)

~

~ ( )

nn

i ni

i

Z Z N

Z

Yie g Z Y N

i

의 정의 (definition)

2

112 적합도 검정(Goodness-of- fit)

bull 우리의 data가 가설상의 분포(정규분포 이항분포 포아슨 분포 등)와 일치하는가

bull Data = theoretical distribution (normal binomial Poisson etc)

H0 정규분포를 따른다 vs H1 not H0

콜레스테롤 수치(mgdl) 대상자 수1-59 2

6-109 2

11-159 7

16-209 19

21-259 4

26-309 6

31-359 3

36-409 4

bull보기 1121(Normal distrsquon)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r 제약조건 ( )+추정하는 모수의 개수

restriction parameters estimated

interval expected rel freq expected freq

bull 119909 =35∙2+85∙2+⋯+385∙4

2+2+⋯+4= 2105319

bull 1199042 =35minus2105319 2∙2+ 85minus2105319 2∙2+⋯+ 385minus2105319 2∙4

2+2+⋯+4minus1

= 759482

119904 = 759482 = 871483

계급구간(interval)

표준화된 계급구간(standardized interval)

상대도수의 기대치(relative frequency)

기대도수(expected frequency)

lt1 001069 0502651~ 59 minus230104 003136 14740

6~109minus172731 008228 38672

11~159minus115357 015667 73637

16~209minus057984 021655 10178

21~259minus000610 021729 10213

26~309056763 015828 74393

31~359 114137 008370 3933736~409 171510 003212 15096

ge41 228884 001104 051909

19766

202877

P(Zltminus230104)

P(-230ltZltminus173)

P(172ltZlt229)

P(Zgt229)

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 2: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

111 분포의 수리적 특징

bull 의 응용 (Usage)적합도 검정(Tests of Goodness-of-Fit)독립성 검정(Tests of Independence)동질성 검정(Tests of Homogeneity)

2

2

1

2 2

1

2

~ independent (01)

~

~ ( )

nn

i ni

i

Z Z N

Z

Yie g Z Y N

i

의 정의 (definition)

2

112 적합도 검정(Goodness-of- fit)

bull 우리의 data가 가설상의 분포(정규분포 이항분포 포아슨 분포 등)와 일치하는가

bull Data = theoretical distribution (normal binomial Poisson etc)

H0 정규분포를 따른다 vs H1 not H0

콜레스테롤 수치(mgdl) 대상자 수1-59 2

6-109 2

11-159 7

16-209 19

21-259 4

26-309 6

31-359 3

36-409 4

bull보기 1121(Normal distrsquon)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r 제약조건 ( )+추정하는 모수의 개수

restriction parameters estimated

interval expected rel freq expected freq

bull 119909 =35∙2+85∙2+⋯+385∙4

2+2+⋯+4= 2105319

bull 1199042 =35minus2105319 2∙2+ 85minus2105319 2∙2+⋯+ 385minus2105319 2∙4

2+2+⋯+4minus1

= 759482

119904 = 759482 = 871483

계급구간(interval)

표준화된 계급구간(standardized interval)

상대도수의 기대치(relative frequency)

기대도수(expected frequency)

lt1 001069 0502651~ 59 minus230104 003136 14740

6~109minus172731 008228 38672

11~159minus115357 015667 73637

16~209minus057984 021655 10178

21~259minus000610 021729 10213

26~309056763 015828 74393

31~359 114137 008370 3933736~409 171510 003212 15096

ge41 228884 001104 051909

19766

202877

P(Zltminus230104)

P(-230ltZltminus173)

P(172ltZlt229)

P(Zgt229)

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 3: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

112 적합도 검정(Goodness-of- fit)

bull 우리의 data가 가설상의 분포(정규분포 이항분포 포아슨 분포 등)와 일치하는가

bull Data = theoretical distribution (normal binomial Poisson etc)

H0 정규분포를 따른다 vs H1 not H0

콜레스테롤 수치(mgdl) 대상자 수1-59 2

6-109 2

11-159 7

16-209 19

21-259 4

26-309 6

31-359 3

36-409 4

bull보기 1121(Normal distrsquon)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r 제약조건 ( )+추정하는 모수의 개수

restriction parameters estimated

interval expected rel freq expected freq

bull 119909 =35∙2+85∙2+⋯+385∙4

2+2+⋯+4= 2105319

bull 1199042 =35minus2105319 2∙2+ 85minus2105319 2∙2+⋯+ 385minus2105319 2∙4

2+2+⋯+4minus1

= 759482

119904 = 759482 = 871483

계급구간(interval)

표준화된 계급구간(standardized interval)

상대도수의 기대치(relative frequency)

기대도수(expected frequency)

lt1 001069 0502651~ 59 minus230104 003136 14740

6~109minus172731 008228 38672

11~159minus115357 015667 73637

16~209minus057984 021655 10178

21~259minus000610 021729 10213

26~309056763 015828 74393

31~359 114137 008370 3933736~409 171510 003212 15096

ge41 228884 001104 051909

19766

202877

P(Zltminus230104)

P(-230ltZltminus173)

P(172ltZlt229)

P(Zgt229)

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 4: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

bull보기 1121(Normal distrsquon)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r 제약조건 ( )+추정하는 모수의 개수

restriction parameters estimated

interval expected rel freq expected freq

bull 119909 =35∙2+85∙2+⋯+385∙4

2+2+⋯+4= 2105319

bull 1199042 =35minus2105319 2∙2+ 85minus2105319 2∙2+⋯+ 385minus2105319 2∙4

2+2+⋯+4minus1

= 759482

119904 = 759482 = 871483

계급구간(interval)

표준화된 계급구간(standardized interval)

상대도수의 기대치(relative frequency)

기대도수(expected frequency)

lt1 001069 0502651~ 59 minus230104 003136 14740

6~109minus172731 008228 38672

11~159minus115357 015667 73637

16~209minus057984 021655 10178

21~259minus000610 021729 10213

26~309056763 015828 74393

31~359 114137 008370 3933736~409 171510 003212 15096

ge41 228884 001104 051909

19766

202877

P(Zltminus230104)

P(-230ltZltminus173)

P(172ltZlt229)

P(Zgt229)

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 5: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

계급구간(interval)

표준화된 계급구간(standardized interval)

상대도수의 기대치(relative frequency)

기대도수(expected frequency)

lt1 001069 0502651~ 59 minus230104 003136 14740

6~109minus172731 008228 38672

11~159minus115357 015667 73637

16~209minus057984 021655 10178

21~259minus000610 021729 10213

26~309056763 015828 74393

31~359 114137 008370 3933736~409 171510 003212 15096

ge41 228884 001104 051909

19766

202877

P(Zltminus230104)

P(-230ltZltminus173)

P(172ltZlt229)

P(Zgt229)

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 6: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

119874119894 minus 1198641198942119864119894 =14762 gt qchisq(0055lowertail=F)= 11071

-gt Reject Ho data ~ normal

constraints ( 119864119894 = 119874119894 120583 = 119883 120590 = 119904)=3 -gt df=8minus3=5

계급구간 관측도수(119926119946) 기대도수(119916119946)119926119946 minus119916119946

120784119916119946

lt 1 0 050265 2760810-4

1-59 2 147406-109 2 38672

090156

11-159 7 73637001796

16-209 19 10178 7646621-259 4 10213 3779426-309 6 74393 02784831-359 3 39337 022162

36-4094 15096 19156

ge 41 0 051909Total 47 47 14762

4

197662

20287

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 7: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

보기 1122 이항분포 (binomial distrsquon)

H0 자료는 이항분포를 따른다 (적합도검정)

각 의사별 신약을 선호하는 환자의 수 의사의 수 환자의 수

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 이상 0 0

합 100 500

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 8: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

이항분포의 가정하에서 기대도수=기대상대도수총합

Expected freq under binomial distrsquon=probtotal

2525

( ) (1 ) 012 25

ˆ 500 2500 02

x xP X x p p xx

p

각 의사별 신약을 선호하는 환자의 수 의사의 수(119926119946) 기대 상대도수 기대도수(119916119946)

0 5 000378 037779

1 6 002361 23612

2 8 007083 70836

3 10 013577 13577

4 10 018668 18668

5 15 019602 19602

6 17 016335 16335

7 10 011084 11084

8 10 006235 62349

9 9 002944 29442

10 이상 0 001733 17332

합계 100 10000 10000

11 27390

1205682 =11minus27390 2

27390+8minus70836 2

70836+⋯+

0minus17332 2

17332= 47678 gt qchisq(00058lowertail=F)= 21955

We reject Ho Data ~ Binomial

df= 10 minus 2 = 8 constraints 119864119894 = 119874119894 119901 = 119901

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 9: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

예제 1123 포아슨분포 (Poisson distrsquon)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson distrsquon

(X ) 012

xeP x x

x

=3 known

H0 병원의 하루 응급환자의 수는 포아송 분포를 따른다

일일 응급환자 수 날짜 수

0 5

1 14

2 15

3 23

4 16

5 9

6 3

7 3

8 1

9 1

10 이상 0

합계 90

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 10: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

응급환자수 날짜 수(119926119946) 기대 상대도수 기대도수(119916119946)119926119946 minus 119916119946

120784

119916119946

0 5 004979 44808 006015

1 14 014936 13443 002312

2 15 022404 20164 132240

3 23 022404 20164 039895

4 16 016803 15123 005088

5 9 010082 90737 000060

6 3 005041 45368 052060

7 3 002160 19444 057313

8 1 000810 072914

0804829 1 000270 024305

10 이상 0 000110 009922

합계 90 1000 9000 3755

107142

1205682 = 119874119894minus119864119894

2

119864119894=5minus44808 2

44808+⋯+

2minus10714 2

10714= 3755 lt 1198832(095 119889119891 = 9 minus 1 = 8) = 15507

We cannot reject Ho Data ~ Poisson

2 22 2

9 1

(5 450) (2 108)3664 15557 (095)

450 108

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 11: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

113 독립성검정Tests of independence

bull 분할표(contingency table)

1205682 =

119894=1

119903

119895=1

119888119874119894119895 minus 119864119894119895

2

119864119894119895~1205942 119889119891 = 119903 minus 1 119888 minus 1 119864119894119895=

119899119894 ∙ 119899119895119899

두 번째 범주형 변수 첫 번째 범주형 변수

120783 120784 120785 ⋯ 119940 합계

120783 11989911 11989912 11989913 ⋯ 1198991119888 1198991

120784 11989921 11989922 11989923 ⋯ 1198992119888 1198992

120785 11989931 11989932 11989933 ⋯ 1198993119888 1198993

⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

119955 1198991199031 1198991199032 1198991199033 ⋯ 119899119903119888 119899119903

합계 1198991 1198992 1198993 ⋯ 119899119888 119899

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 12: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

bull예제 1131

치료방법(treatment)

재발여부(relapse)합계(total)

Yes No

A 294 (77255) 921 (1137745) 1215

B 98 (188210) 2862 (2771790) 2960

C 50 (198002) 3064 (2915998) 3114

D 203 (181533) 2652 (2673467) 2855

합계 645 9499 10144

1205682 = 119874 minus 119864 2

119864

=294 minus 77255 2

77255+921 minus 1137745 2

1137745+⋯ = 81641

gt 1198832(095 119889119891 = 3) = 7815df= 119903 minus 1 119888 minus 1 = 4 minus 1 2 minus 1 = 3

Reject (Ho treatment and relapse are independent ) -gt They are not independent

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 13: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

gt datalt-astable(cbind(c(2949850203)c(921286230642652)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(YN))

gt data

re

trt Y N

A 294 921

B 98 2862

C 50 3064

D 203 2652

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 81641 df = 3 p-value lt 22e-16

data reinput trt $ re $ count cardsA Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652proc freq data=reweight counttables trtremeasures chisqrun

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 14: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

bull 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20를 넘지 않으며 최소기대치가 1이상이면 무관하다 (If min gt1 and cells lt5 are less than 20 then not a problem)

bull 2Ⅹ2 분할표 (table)nlt20 or 20ltnlt49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라

-test is not valid if nlt20 or (20ltnlt49) and expected freq of one or more cells lt 5

2

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 15: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

2

bull2Ⅹ2 table

1205682 =233(131∙36minus52∙14)2

145∙88∙183∙50= 317391 gtgt 1962

Strong evidence to reject (HoSmoking and drinking are independent)

두번째분류기준

첫번째 분류기준

120783 120784 합계

120783 119886 119887 119886 + 119887

120784 119888 119889 119888 + 119889

합계 119886 + 119888 119887 + 119889 119899

SmokingDrinking

Yes No total

Yes 131 52 183

No 14 36 50

Total 145 88 233

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 16: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 60 120 040

60 100 40 12004909

100 120

060 040295469 196 significant

4903 5091 4903 5091

100 120

ˆ ( )

(1 ) (1 )

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p

2p

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 17: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

Yates adjustment (보정)

bull 120568corrected2 =

119899( 119886119889minus119887119888 minus05119899)2

(119886+119888)(119887+119889)(119886+119887)(119888+119889)

bull 1205682 =233( 131∙36minus52∙14 minus05∙233)2

145∙88∙183∙50= 299118

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 18: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

114 동질성 검정 (homogeneity test)

bull 동질성 검정 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가

bull Homogeneity test Are two samples selected from one population

bull 독립성 검정 한 모집단에서 표본 추출 행과 열의 합계는 조절이 아니고 우연히 나타난다

bull Independent test selected from a population Marginal totals are randomly determined

bull 독립성 검정 vs 동질성 검정

bull Independent test vs homogeneity test

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 19: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

bull예제 1141

bull 가설 Patient groups with on-set age lt=18 and age gt 18 have same distributions of family history

gt datalt-astable(cbind(c(28194153)c(35384460)))

gt dimnames(data)lt-list(trt=c(ABCD)re=c(EarlyLater))

gt chisqtest(data)

Pearsons Chi-squared test

data data

X-squared = 36216 df = 3 p-value = 03053

-gt Do not reject Ho

0H

Family History lt=18 gt 18 Total

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

gt datare

trt Early LaterA 28 35B 19 38C 41 44D 53 60

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 20: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

동질성 검정과 모비율 검정

1198670 1199011 = 1199012 119907119904 119867119860 ∶ 1199011 ne 1199012 1198991 = 100 1199011 = 060 1198992 = 120 1199012 = 040

119911 = 1199011minus 1199012 minus( 1199011minus 1199012)0 119901(1minus 119901)

1198991+ 119901(1minus 119901)

1198992

119901 =060∙100+040∙120

100+120=108

220= 049091

119911 =060 minus 040

049091 ∙ 050909100 +

049091 ∙ 050909120

= 295468

1205682 =220 ∙ [60 ∙ 72 minus 40 ∙ 48]2

108 ∙ 112 ∙ 100 ∙ 120= 87302

-gt Reject Ho

표본특성

1 2 합계1 60 40 100

2 48 72 120

합계 108 112 220

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 21: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

data severe

input treat $ outcome $ count

cards

Test f 10

Test u 2

Control f 2

Control u 4

proc freq order=data

tables treatoutcome chisq nocol

weight count

run

Fisherrsquos Exact Test

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 22: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

SAS 시스템

FREQ 프로시저

treat outcome 교차표

treat outcome

빈도|백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 5556 | 1111 | 6667

| 8333 | 1667 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 1111 | 2222 | 3333

| 3333 | 6667 |

-----------+--------+--------+

총합 12 6 18

6667 3333 10000

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 23: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

treat outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 45000 00339우도비 카이제곱 1 44629 00346연속성 수정 카이제곱 1 25313 01116Mantel-Haenszel 카이제곱 1 42500 00393파이 계수 05000분할 계수 04472크래머의 V 05000

경고 셀들의 75가 5보다 작은 기대도수를 가지고 있습니다카이제곱 검정은 올바르지 않을 수 있습니다

Fisher의 정확 검정----------------------------(11) 셀 빈도(F) 10하단측 p값 Pr lt= F 09961상단측 p값 Pr gt= F 00573

테이블 확률 (P) 00533양측 p값 Pr lt= P 01070

표본 크기 = 18

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 24: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

Exact Test

Table Cell

(11) (12) (21) (22) Prob

12 0 0 6 0001

11 1 1 5 0039

10 2 2 4 0533

9 3 3 3 2370

8 4 4 2 4000

7 5 5 1 2560

6 6 6 0 0498

=12 12 6 6

10 2 2 4 18

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 25: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

Table Probabilities

bull One-tailed p-value

bull Two-tailed p-value

00533 00039 00001 00573p

00533 00039 00001 00498 01071p

H0 두 변수는 서로 독립(동질)이다 vs H1 not H0

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 26: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

gt fishertest(matrix(c(7356)22)alternative=greater)

Fishers Exact Test for Count Data

data matrix(c(7 3 5 6) 2 2)

p-value = 02449

alternative hypothesis true odds ratio is greater than 1

95 percent confidence interval

04512625 Inf

sample estimates

odds ratio

2661251

gt matrix(c(7356)22)

[1] [2]

[1] 7 5

[2] 3 6

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 27: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

McNemar Test Matched pairs

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 28: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

data one

input hus_resp $ wif_resp $ no

datalines

yes yes 20

yes no 5

no yes 10

no no 10

run

proc freq

tables hus_respwif_resp agree

weight no

run

ldquoHo husband and wife 의 approval rates는 같다rdquo를 기각하지 못함

We do not reject ldquoHo approval rates of husband and wife are the samerdquo

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 29: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95 신뢰수준에서 기각한다

Kappa=1 gtgt perfect agreement Kappa gt 08 gtgt excellent agreement Kappa gt 04 gtgt moderate agreement

CI does not include 0 -gt we reject the null hypo of K=0 by 95 confidence level

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 30: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

116 Relative risk odds ratio and Mantel-Haenszel statistics

bull 관찰연구 (observational study)

bull 전향적 연구 (prospective study)

bull 후향적 연구 (retrospective study)

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 31: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

상대위험도 (Relative risk)Disease

Risk O X

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull 119877119877 =119886

119886+119887119888

119888+119889

bull s e ln 119877119877 =1

119886+1

119888minus

1

119886+119887+

1

119888+119889

bull ln 119877119877 plusmn 1199111minus1205722 ∙ s e ln 119877119877

bull 119890ln 119877119877 plusmn119911

1minus1205722∙se ln 119877119877

= 119877119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119877119877

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 32: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

예제 1161 Relative risk odds ratio and Mantel-Haenszel statistics

119877119877 =26406

1783564=0064

0050= 128

12822 ∙ 119890^ minus196 ∙1

26+

1

178minus

1

406+

1

3564= 0861

12822 ∙ 119890^ 196 ∙1

26+

1

178minus

1

406+

1

3564= 1910

95 CI includes 1

Smoking Disease progress

Yes NoYes 26 380 406

No 178 3386 3564

204 3766 3970

표1162

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 33: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

data preg

input smoke $ preg $ count

cards

smoke early 26

smoke abnormal 380

nonsmoke early 178

nonsmoke abnormal 3386

proc freq order=data

weight count

tables smokepregmeasures chisq

run

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 34: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

오즈비 (Odds Ratio)sample

Risk case control

O 119886 119887 119886 + 119887

X 119888 119889 119888 + 119889

119886 + 119888 119887 + 119889 119899

bull Odds = p(1-p)

bull 환자집단 오즈 [119886(119886 + 119888)][119888(119886 + 119888)] = 119886119888

bull 정상집단 오즈 [119887(119887 + 119889)][119889(119887 + 119889)] = 119887119889

bull 119874119877 =119886

119888119887

119889

=119886119889

119887119888s e ln 119874119877 =

1

119886+1

119887+1

119888+1

119889

CI ln 119874119877 plusmn 1199111minus1205722 ∙ s e ln 119874119877

bull 119890ln 119874119877 plusmn119911

1minus1205722∙se ln OR

=

= 119874119877 ∙ 119890^ plusmn1199111minus1205722 ∙ s e ln 119874119877

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 35: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

예제 1162

119874119877 =52 ∙ 3486

352 ∙ 78= 660

66023 ∙ 119890^ minus196 ∙1

52+

1

352+1

78+

1

3486= 4571

66023 ∙ 119890^ 196 ∙1

52+

1

352+1

78+

1

3486= 9536

Smoking

Obesity

Yes No

Yes 52 352 404

No 78 3486 3564

130 3838 3968

표1164

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 36: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

data obe

input smoke $ case $ count

cards

smoke case 52

smoke control 352

nonsmoke case 78

nonsmoke control 3486

proc freq data=obe

weight count

tables smokecasemeasures

run

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 37: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

Mantel-Haenszel 통계량

교란변수가 k개의 층

층 i의 기대도수 119890119894=119886119894+119887119894 119886119894+119888119894

119899119894

119907119894 =119886119894 + 119887119894 119888119894 + 119889119894 119886119894 + 119888119894 119887119894 + 119889119894

1198991198942(119899119894 minus 1)

1205941198721198672 =

( 119894=1119896 119886119894 minus 119894=1

119896 119890119894)2

119894=1119896 119907119894

~12059412 119874119877119872119867 =

119894=1119896 (119886119894119889119894119899119894)

119894=1119896 (119887119894119888119894119899119894)

공통오즈비

Risk

Strata =i

case control

Exposed 119886119894 119887119894 119886119894 + 119887119894

Not 119888119894 119889119894 119888119894 + 119889119894

119886119894 + 119888119894 119887119894 + 119889119894 119899119894

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 38: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

예제 1163

Age lt= 55

Risk OCAD patients control Total

Exposed 21 11 32

Unexposed 16 6 22

합계 37 17 54

Age gt= 56

Risk OCAD 환자 정상 합계

Exposed 50 14 64

Unexposed 18 6 24

68 20 88

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 39: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합

data one

input age risk case n

cards

1 1 1 21

1 1 2 11

1 2 1 16

1 2 2 6

2 1 1 50

2 1 2 14

2 2 1 18

2 2 2 6

run

proc freq

tables ageriskcasemeasures CMH

weight n

run

Page 40: F2 Chi-square dist’n - Seoul National Universityhosting03.snu.ac.kr/~hokim/int/2019/chap_11.pdf · 2019-05-27 · 이항분포의가정하에서기대도수=기대상대도수*총합