f2 chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 ·...

Chapter 11분포와 도수분석

Chi-square dist’n & the analysis of frequencies

2017/5/23

2

10.1 분포의 수리적 특징

• 의 응용 (Usage)적합도 검정(Tests of Goodness-of-Fit)독립성 검정(Tests of Independence)동질성 검정(Tests of Homogeneity)

2

2

1

2 2

1

2

, , ~ independent (0,1)

~

. . ~ ( , )

nn

i ni

i

Z Z N

Z

Yie g Z Y N

i

의 정의 (definition)

2

11.2 적합도 검정(Goodness-of- fit)

• 우리의 data가 가설상의 분포(정규분포, 이항분포, 포아슨 분포 등)와 일치하는가?

• Data = theoretical distribution (normal, binomial, Poisson, etc.) ?

H0: 정규분포를 따른다(Normally distributed) . vs H1: not H0

콜레스테롤 수치(mg/dl) 대상자 수(freq)

1-5 2

6-10 2

11-15 7

16-20 19

21-25 4

26-30 6

31-35 3

36-40 4

•Ex 11.2.1(Normal dist’n)

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r : 제약조건 ( )+추정하는 모수의 개수

# restrictions # parameters estimated

𝑥 = 10.53, 𝑠 = 8.62

𝑋 = 11에 해당하는 𝑧값은, 𝑧 =11−10.53

8.62= 0.05

𝑋 = 16에 해당하는 𝑧값은, 𝑧 =15−10.53

8.62= 0.63

2

2 2

1

~k

i i

k ri i

O E

E

Oi E

i O Ei i

관측치(observed) 기대치(expected)

r : 제약조건 ( )+추정하는 모수의 개수

# restriction # parameters estimated

1.5

계급구간Interval

표준화된 계급구간Standardized int.

상대도수의 기대치Expected relative fre.

기대도수Expected Freq.

<1 0.13 6.321~ 5.9 −1.11 0.17 7.766~10.9 −0.53 0.22 10.44

11~15.9 0.05 0.22 10.1216~20.9 0.63 0.15 7.0821~25.9 1.21 0.08 3.57

26~30.91.79 0.03 1.30

31~35.9 2.37 0.01 0.3436~40.9 2.95 0.00 0.06

≥41 >3.53 0.00 0.01

1.71

계급구간 관측도수(𝑶𝒊) 기대도수(𝑬𝒊)𝑶𝒊 −𝑬𝒊

𝟐/𝑬𝒊

< 1 0 6.32 6.11

1-5 2 7.76 2.76

6-10 2 10.44 6.50

11-15 7 10.12 1.14

16-20 19 7.08 16.08

21-25 4 3.57 0.01

26-30 6 1.30

74.5431-35 3 0.3436-40 4 0.06

> 40 0 0.01Total 47 47 107.14

13 1.71

𝛸2 = 𝑖=1𝑘 (𝑂𝑖−𝐸𝑖)

2

𝐸𝑖=107.14 > qchisq(0.05,4,lower=F)=9.49

• Reject (normally distributed) :

-> Not normally distributed.

• 기대도수가 충분히 커야 ( >10)근사값이 좋음. <5인 경우 cell을 합쳐서 10보다 크게다시 범주화 시켜야 한다.

• Chi-square approximation is valid when expected freq is large enough ( >10). When <5, we can re-categorize the levels to have enough cell sizes.

• 모수가 알려진 경우, 자유도에 영향을 준다. If the parameters are known, df is different.

0H

iE

iE

iE

iE

EX 11.2.2 이항분포 (binomial dist’n)100명의 의사들이 각각 25명으로 이루어진 환자 표본 추출. 환자들에게 신약과 구약 중 선호하는 진통제가

무엇인지 조사. 데이터가 이항분포를 따른다고 할 수 있는지 유의수준 0.005에서 검정하라. ( decide whether data ~ binomial dist or not)

H0: 자료는 이항분포를 따른다. (적합도검정)

Ho: Data from binomial distribution.

Number of Patients

Out of 25 Preferring

New Pain Reliever

Number of Doctors

Reporting this Number

Total Number of

Patients Preferring New

Pain Reliever by Doctor

0 5 0

1 6 6

2 8 16

3 10 30

4 10 40

5 15 75

6 17 102

7 10 70

8 10 80

9 9 81

10 or more 0 0

Total 100 500

TABLE 11.2.4

이항분포의 가정하에서 기대도수=기대상대도수*총합

Expected freq under binomial dist’n=prob*total

2525

( ) (1 ) , 0,1,2, ,25

ˆ 500 / 2500 0.2

x xP X x p p xx

p

2 22 2

10 2

(11 2.74) (0 1.73)47.624

2.74 1.73

유의하므로 이항분포의 귀무가설을 기각한다.

Significant -> reject H0 (data~binomial dist’n)

Number of Patients Out

of 25 Preferring New

Pain Reliever

Expected

Relative

Frequency

0 5 0.0038 0.38

1 6 0.0236 2.36

2 8 0.0708 7.08

3 10 0.1358 13.58

4 10 0.1867 18.67

5 15 0.1960 19.60

6 17 0.1633 16.33

7 10 0.1109 11.09

8 10 0.0623 6.23

9 9 0.0295 2.95

10 or more 0 0.0173 1.73

Total 100 1.0000 100.00

2.74

TABLE 12.3.5 Calculations for Example 12.3.2

11

Ex 11.2.3 포아슨분포 (Poisson dist’n)

포아슨분포의 가정 하에서 상대도수의 기대치

Expected relative freq under Poisson dist’n

(X ) , 0,1,2,!

xeP x x

x

=3: known

H0: 병원의 하루 응급환자의 수의 평균은 3이다.Mean # pt is 3.

Number of Emergency Admissions in a Day

Number of Days This Number of Emergency Admissions Occurred

일일 응급환자 수 날짜 수

0 51 142 153 234 165 96 37 38 19 1

10 이상 0합계 90

2 22 2

9 1

(5 4.50) (2 1.08)3.664 15.557 (0.95)

4.50 1.08

응급환자수 날짜 수(𝑶𝒊) 기대 상대도수 기대도수(𝑬𝒊)𝑶𝒊 − 𝑬𝒊

𝟐

𝑬𝒊

0 5 0.050 4.50 0.056

1 14 0.149 13.41 0.026

2 15 0.224 20.16 1.321

3 23 0.224 20.16 0.4

4 16 0.168 15.12 0.051

5 9 0.101 9.09 0.001

6 3 0.050 4.50 0.5

7 3 0.022 1.98 0.525

8 1 0.008 0.72

0.7849 1 0.003 0.27

10 이상 0 0.001 0.09

합계 90 1.000 90.00 3.664

1.082

Do not reject Ho=> 𝜆 = 3 인 포아송 분포가 아니라는 증거는 없다.

Ex 11.2.4 균등분포 (Uniform dist’n)

균등분포의 가정 하 기대도수 =200/5=40

Expected relative freq under uniform dist’n=200/5=40

𝛸2 = 𝑂𝑖−𝐸𝑖

2

𝐸𝑖=97.15 > 13.277 = qchisq(0.01,4,lower=F)

𝑝-value=pchisq(97.15,4,lower=F)=3.98 × 10−20

기간 독감 환자수

2005.12 62

2006.01 84

2006.02 17

2006.03 16

2006.04 21

합계 200

11.3 독립성검정Tests of independence

• 분할표(contingency table)

. . ..1, 2, 3 1, 2, 3, 4

. . 2.. .. ..

. .

..

..2

2 2

( 1)( 1)# #

~ ,

j i jii j

ij i j

i j

ij ij

ij ij

r c ij ijrow coli j ij

N N NN

N N N

N NE N

N

O EO N

E

0H : two variables are independent.

trt

Relapse

TotalYes No

A 294 (77.26) 921 (1137.74) 1215

B 98 (188.21)2862

(2771.79)2960

C 50 (198.00)3064

(2916.00)3114

D 203 (181.53)2652

(2673.47)2855

Total 645 9499 10144

EX 11.3.1

TABLE 12.4.4 Observed and Expected Frequencies for Example 12.4.1

22 2

2

2

(4 1)(2 1)# #

(294 77.26) (2652 2673.47)... 800

294 2652

~ ,

ij ij

i j ij

ij ijrow col

O E

E

O N

p-value~0: not independent at all -> strongly associated

> data<-as.table(cbind(c(294,98,50,203),c(921,2862,3064,2652)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Y","N"))> chisq.test(data)

Pearson's Chi-squared testdata: dataX-squared = 816.4101, df = 3, p-value < 2.2e-16

data re;input trt $ re $ count @@;cards;A Y 294 A N 921 B Y 98 B N 2862C Y 50 C N 3064D Y 203 D N 2652;proc freq data=re;weight count;tables trt*re/measures chisq;run;

• 작은 기대도수 (small expected freq)기대치 5미만의 cell수가 전체 20%를 넘지 않으며, 최소기대치가 1이상이면 무관하다. (If min >1 and cells <5 are less than 20% then not a problem)

• 2Ⅹ2 분할표 (table)n<20 or 20<n<49 그리고 기대도수 5이하 일경우에는 -test를 하지 말라!!

-test is not valid if n<20 or (20<n<49) and expected freq of one or more cells < 5.

• Yates adjustment (보정) : 꼭 읽어보자!! Read !!

2

2Ⅹ2 분할표 (table)

𝛸2 =𝑛(𝑎𝑑−𝑏𝑐)2

(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)

𝛸2 =233 131 36 − 52 14 2

145 88 183 50= 31.7391

-> Reject Ho:Independence

-> They are associated.

Yates’ correction

𝛸corrected2 =

𝑛( 𝑎𝑑−𝑏𝑐 −0.5𝑛)2

(𝑎+𝑐)(𝑏+𝑑)(𝑎+𝑏)(𝑐+𝑑)

𝛸2 =233 131 36 − 52 14 −0.5 233 2

(145)(88)(183)(50)= 29.9118

smoking음주 경험 여부

(drinking)

Yes No total

흡연 경험 (Y) 131 52 183

흡연

무경험자(N)14 36 50

total 145 88 233

11.4 동질성 검정 (homogeneity test)

• 동질성 검정: 각 각의 모집단에서 독립적으로 뽑은 표본들의 분포가 서로 동질의 것인가?

• Homogeneity test: Are two samples selected from one population?

• 독립성 검정 : 한 모집단에서 표본 추출, 행과 열의 합계는 조절이 아니고 우연히 나타난다.

• Independent test : selected from a population. Marginal totals are randomly determined.

• 독립성 검정 v.s. 동질성 검정

• Independent test vs. homogeneity test

• 𝑯𝟎 : 18세 이전에 조현병이 발병한 환자들과 18세 이후에 조현병이 발병한 환자들의 가족력은 동일하다.

• (Two groups have same distribution. )

• Cannot reject Ho -> Same population!

가족력 (family history) 18세 이전 18세 이후 합계

A 28 35 63

B 19 38 57

C 41 44 85

D 53 60 113

합계 141 177 318

> data<-as.table(cbind(c(28,19,41,53),c(35,38,44,60)))> dimnames(data)<-list(trt=c("A","B","C","D"),re=c("Early","Later"))> chisq.test(data)

Pearson's Chi-squared testdata: dataX-squared = 3.6216, df = 3, p-value = 0.3053

① test2

2 22

0

( ) 220(60.72 40.48)

( )( )( )( ) 108 112 100 120

8.7302 3.841

n ad bc

a c b d a b c d

H∴Reject

•2Ⅹ2 table

∴probabilities of having the disease for two groups are significantly different.

②두 집단의 확률에 대한 비교

(Comparing two probabilities)

1 1 2

0 1 2 1 2

1 1 2

1 2

ˆ100 .60 120 0.40

.60 100 .40 1200.4909

100 120

0.60 0.402.95469 1.96 significant

.4903 .5091 .4903 .5091

100 120

: :

ˆ ( )

(1 ) (1 )

. .

a

n p n

p

Z

H p p H p p

p p pZ

p p p p

n n

e g

2p̂

2p̂

•debatea.sas* File : debatea.sas ;

options ls=70 ps=55 nodate

nonumber ;

data one;

input id school gender compare

argue research reason speak ;

if school in (3,5,6,8) ;

label id='Survey Number'

school='High School'

compare='How Debate

Compares to OthersClasses'

argue='Argumentation'

research='Research'

reason='Reasoning'

speak='Speaking' ;

cards;

1 6 1 1 1 1 1 1

108 7 1 1 1 1 1 2

56 3 1 1 1 1 1 1

,,,생략

70 6 1 1 1 1 1 1

69 6 2 1 1 1 1 1

;

run;

proc freq data=one;

tables school*compare/chisq

expected ;

title 'Comparing Schools in the

Debate Survey';

run;

proc freq data=one;

tables school*compare/exact ;

title 'Comparing Schools in the

Debate Survey';

run;

data respire;

input treat $ outcome $ count ;

cards;

test f 40

test u 20

placebo f 16

placebo u 48;

proc freq;

weight count;

tables treat*outcome/chisq;

run;

SAS 시스템

FREQ 프로시저

treat * outcome 교차표

treat outcome

빈도|백분율|

행 백분율|칼럼 백분율|f |u | 총합-----------+--------+--------+placebo | 16 | 48 | 64

| 12.90 | 38.71 | 51.61| 25.00 | 75.00 || 28.57 | 70.59 |

-----------+--------+--------+test | 40 | 20 | 60

| 32.26 | 16.13 | 48.39| 66.67 | 33.33 || 71.43 | 29.41 |

-----------+--------+--------+총합 56 68 124

45.16 54.84 100.00

treat * outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------

카이제곱 1 21.7087 <.0001

우도비 카이제곱 1 22.3768 <.0001

연속성 수정 카이제곱 1 20.0589 <.0001

Mantel-Haenszel 카이제곱 1 21.5336 <.0001

파이 계수 -0.4184

분할 계수 0.3860

크래머의 V -0.4184

Fisher의 정확 검정----------------------------

(1,1) 셀 빈도(F) 16

하단측 p값 Pr <= F 2.838E-06

상단측 p값 Pr >= F 1.0000

테이블 확률 (P) 2.397E-06

양측 p값 Pr <= P 4.754E-06

표본 크기 = 124

data severe;

input treat $ outcome $ count ;

cards;

Test f 10

Test u 2

Control f 2

Control u 4

;

proc freq order=data;

tables treat*outcome / chisq nocol;

weight count;

run;

11.5 Fisher의 정확 검정

SAS 시스템

FREQ 프로시저

treat * outcome 교차표

treat outcome

빈도|

백분율|

행 백분율|f |u | 총합-----------+--------+--------+

Test | 10 | 2 | 12

| 55.56 | 11.11 | 66.67

| 83.33 | 16.67 |

-----------+--------+--------+

Control | 2 | 4 | 6

| 11.11 | 22.22 | 33.33

| 33.33 | 66.67 |

-----------+--------+--------+

총합 12 6 18

66.67 33.33 100.00

treat * outcome 테이블에 대한 통계량

통계량 자유도 값 확률값----------------------------------------------------------카이제곱 1 4.5000 0.0339우도비 카이제곱 1 4.4629 0.0346연속성 수정 카이제곱 1 2.5313 0.1116Mantel-Haenszel 카이제곱 1 4.2500 0.0393파이 계수 0.5000분할 계수 0.4472크래머의 V 0.5000

경고: 셀들의 75%가 5보다 작은 기대도수를 가지고 있습니다.카이제곱 검정은 올바르지 않을 수 있습니다.

Fisher의 정확 검정----------------------------(1,1) 셀 빈도(F) 10하단측 p값 Pr <= F 0.9961상단측 p값 Pr >= F 0.0573

테이블 확률 (P) 0.0533양측 p값 Pr <= P 0.1070

표본 크기 = 18

Exact Test

Table Cell

(1,1) (1,2) (2,1) (2,2) probabilities

12 0 0 6 .0001

11 1 1 5 .0039

10 2 2 4 .0533

9 3 3 3 .2370

8 4 4 2 .4000

7 5 5 1 .2560

6 6 6 0 .0498

Table Probabilities

• One-tailed p-value

• Two-tailed p-value

0.0533 0.0039 0.0001 0.0573p

0.0533 0.0039 0.0001 0.0498 0.1071p

H0: 두 변수는 서로 독립(동질)이다. vs H1: not H0

> fisher.test(matrix(c(7,3,5,6),2,2),alternative='greater')

Fisher's Exact Test for Count Data

data: matrix(c(7, 3, 5, 6), 2, 2)

p-value = 0.2449

alternative hypothesis: true odds ratio is greater than 1

95 percent confidence interval:

0.4512625 Inf

sample estimates:

odds ratio

2.661251

11.6 상대위험비, 오즈비, 그리고 Mantel-Haenszel 통계량Relative risk, odds ratio, and Mantel-Haenszel stat

• 전향적연구 (prospective study): 노출여부에 따라 2 집단을 뽑아서 시간에 따라 관찰 (Fix # exposed and unexposed, then follow)

• 후향적연구 (retrospective study): 결과변수에 따라 표본을 뽑고 과거를 조사(Fix # pt and controls and investigate the past)

(Relative Risk)

𝑅𝑅 =𝑎

𝑎+𝑏𝑐

𝑐+𝑑

질병 유무 (Disease)

위험요인 (Risk) 있음(Y) 없음(N) 합계

노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏

노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑

합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛

data preg;input smoke $ preg $ count @@;cards;smoke early 26 smoke abnormal 380 nonsmoke early 178 nonsmoke abnormal 3386 ;proc freq order=data;weight count; tables smoke*preg/measures chisq;run;

상대위험도

(Odds Ratio)

𝑶𝑹 =𝑎

𝒄𝒃

𝑑

=𝒂𝒅

𝒃𝒄

질병 유무 (Disease)

위험요인 (Risk) 있음(Y) 없음(N) 합계

노출됨(Exposed) 𝑎 𝑏 𝑎 + 𝑏

노출되지 않음(Unexposed) 𝑐 𝑑 𝑐 + 𝑑

합계 𝑎 + 𝑐 𝑏 + 𝑑 𝑛

Odds = p/(1-p)

Odds for exposed: =𝒂

𝒂+𝒃

𝟏−𝒂

𝒂+𝒃

=𝒂

𝒂+𝒃𝒃

𝒂+𝒃

=𝒂

𝒃

Odds for unexposed: =𝒄

𝒄+𝒅

𝟏−𝒄

𝒄+𝒅

=𝒄

𝒄+𝒅𝒅

𝒄+𝒅

=𝒄

𝒅

Odds Ration = 𝒂

𝒃𝒄

𝒅

=𝒂𝒅

𝒃𝒄

오즈비

* Odds Ratio and Relative Risk

• Risk is the preferable measure because it is a probability.

• Why odds?

– OR is often a good approximation to the RR. (for a rare disease)

– Sometimes OR is either all we can estimate (case-control studies) or

– It is the most convenient to calculate (logistic regression analysis)

* Sun protection during childhood and cutaneous melanoma

Sun protection? Cases Controls total

Yes 99 132 231

No 303 290 593

total 402 422 824

RR is meaningful when the samples are randomly selected. In case-control study, we have a sample stratified by case-control status.

Proportion of melanoma=402/824=0.49

Woodward (Epidemiology-study design and data analysis)

f1, f2: sampling fractions of cases and controls

Woodward (Epidemiology-study design and data analysis)

(a) Population value (b)Expected values in the sample

Diseased Not Diseased

Total cases controls Total

Exposed A B A+B f1A f2B f1A+f2B

Not Exposed

C D C+D f1C f2D f1C+f2D

total A+C B+D N f1(A+C) f2(B+D) n

/( )

/( )

A A BRR

C C D

Risk factor for the exposed

Risk factor for the unexposed

1 1 2/( ) /( )f A f A f B A A B

1 1 2/( ) /( )f C f C f D C C D





Not Exposed



/( )

/( )

A A BRR

C C D

RR (b)

1 1 2

1 1 2

( ) ( )

( ) ( )

f A f C f D A C D

f C f A f B C A B

RR (a)





Not Exposed



1 2

2 1

( )( )

( )( )

f A f D ADOR

f B f C BC

Odds for the exposed

Odds for the unexposed

1 2/ /f A f B A B

1 2/ /f C f D C D

But

We can use a case-control study to estimate the OR, but not risk, RR, nor odds.

http://sphweb.bumc.bu.edu/otlt/MPH-

Modules/EP/EP713_Association/EP713_Association8.html

• Measures of associations (Boston University)

[Mantel-Haeszel 통계량]교란변수의 효과를 보정한 후의 연관관계

association after adjusting for a confounding factor

• 여러 개의 표에서 기대도수와 분산을 계산하여 합친다. (calculate expected values and variances -> and then combine them to have combined effect)

• 합치기 전에 homogeneity test를 하여 합치는 것이 합리적인지 판단할 필요가 있다. (homogeneity test is recommended prior to the calculation of the combined effect)

Sets of 2*2 tablesdata ca;

input gender $ ECG $ disease $ count;

datalines;

female <0.1 yes 4

female <0.1 no 11

female >=0.1 yes 8

female >=0.1 no 10

male <0.1 yes 9

male <0.1 no 9

male >=0.1 yes 21

male >=0.1 no 6

;

options ls=75 nonumber nodate;

proc freq;

weight count ;

tables gender*disease /nocol nopct chisq ;

tables gender*ECG*disease /nocol nopct cmh chisq measures;

run;

The FREQ Procedure

Table of gender by disease

gender disease

Frequency|

Row Pct |no |yes | Total

---------+--------+--------+

female | 21 | 12 | 33

| 63.64 | 36.36 |

---------+--------+--------+

male | 15 | 30 | 45

| 33.33 | 66.67 |

---------+--------+--------+

Total 36 42 78

Statistics for Table of gender by disease

Statistic DF Value Prob

------------------------------------------------------

Chi-Square 1 7.0346 0.0080

Likelihood Ratio Chi-Square 1 7.1209 0.0076

Continuity Adj. Chi-Square 1 5.8681 0.0154

Mantel-Haenszel Chi-Square 1 6.9444 0.0084

Phi Coefficient 0.3003

Contingency Coefficient 0.2876

Cramer's V 0.3003

Table 1 of ECG by disease

Controlling for gender=female

ECG disease

Frequency|


---------+--------+--------+

<0.1 | 11 | 4 | 15

| 73.33 | 26.67 |

---------+--------+--------+

>=0.1 | 10 | 8 | 18

| 55.56 | 44.44 |

---------+--------+--------+

Total 21 12 33

Statistics for Table 1 of ECG by disease



------------------------------------------------------

Chi-Square 1 1.1175 0.2905






Cramer's V 0.1840

Estimates of the Relative Risk (Row1/Row2)

Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 2.2000 0.5036 9.6107Cohort (Col1 Risk) 1.3200 0.7897 2.2063Cohort (Col2 Risk) 0.6000 0.2240 1.6073

Sample Size = 33


Controlling for gender=male

ECG disease

Frequency|


---------+--------+--------+

<0.1 | 9 | 9 | 18

| 50.00 | 50.00 |

---------+--------+--------+

>=0.1 | 6 | 21 | 27

| 22.22 | 77.78 |

---------+--------+--------+

Total 15 30 45

Statistics for Table 2 of ECG by disease



------------------------------------------------------

Chi-Square 1 3.7500 0.0528






Cramer's V 0.2887


Type of Study Value 95% Confidence Limits-----------------------------------------------------------------Case-Control (Odds Ratio) 3.5000 0.9587 12.7775Cohort (Col1 Risk) 2.2500 0.9680 5.2298Cohort (Col2 Risk) 0.6429 0.3883 1.0642

Sample Size = 45

Summary Statistics for ECG by disease

Controlling for gender

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

---------------------------------------------------------------

1 Nonzero Correlation 1 4.5026 0.0338

2 Row Mean Scores Differ 1 4.5026 0.0338

3 General Association 1 4.5026 0.0338

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence Limits

-------------------------------------------------------------------------

Case-Control Mantel-Haenszel 2.8467 1.0765 7.5279

(Odds Ratio) Logit 2.8593 1.0807 7.5650

Cohort Mantel-Haenszel 1.6414 1.0410 2.5879

(Col1 Risk) Logit 1.5249 0.9833 2.3647


(Col2 Risk) Logit 0.6337 0.4046 0.9926

Breslow-Day Test forHomogeneity of the Odds Ratios------------------------------Chi-Square 0.2155DF 1Pr > ChiSq 0.6425

Total Sample Size = 78

OR’s are homogeneous, so CMH estimates of common effect is meaningful.Otherwise, it’s not.

data ca;

input gender $ ECG $ disease $ count;

datalines;

female <0.1 yes 8

female <0.1 no 11

female >=0.1 yes 8

female >=0.1 no 20

male <0.1 yes 9

male <0.1 no 9

male >=0.1 yes 30

male >=0.1 no 6

;

options ls=75 nonumber nodate;

proc freq;

weight count ;

tables gender*ECG*disease /nocol nopct cmh measures;

run;



ECG disease

Frequency|


---------+--------+--------+

<0.1 | 11 | 8 | 19

| 57.89 | 42.11 |

---------+--------+--------+

>=0.1 | 20 | 8 | 28

| 71.43 | 28.57 |

---------+--------+--------+

Total 31 16 47


Type of Study Value 95% Confidence Limits

-----------------------------------------------------------------

Case-Control (Odds Ratio) 0.5500 0.1615 1.8731

Cohort (Col1 Risk) 0.8105 0.5171 1.2703

Cohort (Col2 Risk) 1.4737 0.6701 3.2407

Sample Size = 47



ECG disease

Frequency|


---------+--------+--------+

<0.1 | 9 | 9 | 18

| 50.00 | 50.00 |

---------+--------+--------+

>=0.1 | 6 | 30 | 36

| 16.67 | 83.33 |

---------+--------+--------+

Total 15 39 54


Type of Study Value 95% Confidence Limits

-----------------------------------------------------------------

Case-Control (Odds Ratio) 5.0000 1.3992 17.8677

Cohort (Col1 Risk) 3.0000 1.2641 7.1198

Cohort (Col2 Risk) 0.6000 0.3696 0.9740

Sample Size = 54

Summary Statistics for ECG by disease

Controlling for gender

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

---------------------------------------------------------------

1 Nonzero Correlation 1 1.2063 0.2721

2 Row Mean Scores Differ 1 1.2063 0.2721

3 General Association 1 1.2063 0.2721

Estimates of the Common Relative Risk (Row1/Row2)

Type of Study Method Value 95% Confidence Limits

-------------------------------------------------------------------------

Case-Control Mantel-Haenszel 1.5604 0.6767 3.5982

(Odds Ratio) Logit 1.5893 0.6572 3.8433


(Col1 Risk) Logit 1.0708 0.7187 1.5954


(Col2 Risk) Logit 0.7677 0.5081 1.1600

Breslow-Day Test for

Homogeneity of the Odds Ratios

------------------------------

Chi-Square 6.2128

DF 1

Pr > ChiSq 0.0127

* McNemar Test : Matched pairs

data one;

input hus_resp $ wif_resp $ no ;

datalines;

yes yes 20

yes no 5

no yes 10

no no 10

;run;

proc freq ;

tables hus_resp*wif_resp / agree ;

weight no ;

run;

“Ho : husband and wife 의 approval rates는 같다”를 기각하지 못함.

We do not reject “Ho : approval rates of husband and wife are the same”.

신뢰구간이 0을 포함하지 않으므로 K=0 이라는 귀무가설을 95% 신뢰수준에서 기각한다.

Kappa=1 >> perfect agreement, Kappa > 0.8 >> excellent agreement Kappa > 0.4 >> moderate agreement

CI does not include 0. -> we reject the null hypo. of K=0 by 95% confidence level.

# Chisq test by R filename: chisq.r

data <- matrix(c(25, 5, 15, 15), ncol=2, byrow=T)

data

data2 <- matrix(c(16, 11, 3, 21, 8, 1), ncol=2, byrow=T)

data2

chisq.test(data)

chisq.test(data2)

fisher.test(data2)

data <- matrix(c(6, 2, 8, 4), ncol=2, byrow=T)

data

mcnemar.test(data)

## From Agresti(2007) p.39

M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))

dimnames(M) <- list(gender=c("M","F"),

party=c("Democrat","Independent", "Republican"))

M

colSums(M)

rowSums(M)

cbind(M,rowSums(M))

rbind(M,colSums(M))

prop.table(M, margin=2)*100




(Xsq <- chisq.test(M)) # Prints test summary

Xsq$observed # observed counts (same as M)

Xsq$expected # expected counts under the null

Xsq$residuals # Pearson residuals

sum((Xsq$residuals)**2)

1-pchisq(sum((Xsq$residuals)**2),

(ncol(M)-1)*(nrow(M)-1))

f2 chi-square dist’nhosting03.snu.ac.kr/~hokim/int/2017/chap_11.pdf · 2017-05-26 ·...

Documents