logistic multiple 2557 up - kku web hosting · logistic function 1 e 1 f( ) 0 1 e 1 1 1 e 1 fitting...
TRANSCRIPT
1
Multiple Logistic Regression
ผชวยศาสตราจารยนคม ถนอมเสยง
ภาควชาชวสถตและประชากรศาสตร
คณะสาธารณสขศาสตร ม.ขอนแกน
0
1
1/2)(
e1
1)f(-
<------- Z ------->
Logistic function
)(e1
1)f(
0
e1
1
1
e1
1
Fitting Multiple Logistic Regression
วเคราะหความสมพนธระหวางตวแปรอสระ 2 ตวแปร
กบตวแปรตาม
ตวแปรตาม (Dependent, Outcome, Response) = discrete
(two possible)
ตวแปรอสระ (independent, predictor, explanatory)
= continuous, categorical (--> dummy)
Outcome Vpredictor
predictor
predictor ...
Multiple Logistic Regression
ตวอยาง การวเคราะหความสมพนธระหวางตวแปรอาย
เชอชาต นาหนกทเพมขน การสบบหร ฯลฯ
กบการเกด low birth weight
LBW0 >=25001 <2500 Age (year)
Race 1=white,2=black,3=other
Lwt = weight mothers at last periodSmk1=yes0=no... FTV = number physician visit
during first Trimester
การวเคราะห Logistic Regression
เขยนความสมพนธแบบ Logit ไดดงน
pp xxxp
pxy
...ˆ1
ˆln)(ˆ 2210
pp
pp
XXXe
XXXe
p
...1
...ˆ
22110
22110
ความนาจะเปนในการเกดเหตการณ
)(ˆ ixp
ตวแปรอสระทอยในโมเดล
โมเดลของ logit กรณมตวแปรแบบ Polychotomous
ใหทาใหเปนตวแปรหน (dummy variable)
pp
k
ljljl xDxy
j
1
10 1
ˆ
ตวอยาง กรณมตวแปรม k ระดบ สรางตวแปรหน
ไดเทากบ k-1 ตวแปร (k=ระดบ, กลม)
ตวแปรหน (dummy variable)
variable D1 D2
code=1 0 0
code=2 1 0
code=3 0 1
(ftv)β)(raceβ)(raceβ(lwt)β(age)ββy othersB 520 431ˆ
เชอชาต D1 D2
ขาว 0 0
ดา 1 0
อนๆ 0 1
ตวอยาง ตวแปรเชอชาต (ขาว, ดา, อนๆ)ใหทาเปน
ตวแปรหน (dummy variables) ดงน
STATA ระบ xi: logit low age lwt i.race ftv
2
การวเคราะห Multiple Logistic Regression ระหวาง Low Birth
Weight และ age, lwt, race, ftv
ftvβIraceβIraceβlwtβageβp
py
5ˆ3__
4ˆ2__
3ˆ
2ˆ
1ˆ
ˆ1
ˆlnˆ 0
. xi: logit low age lwt i.race ftv, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Logistic regression Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
. list id low age lwt _Irace_2 _Irace_3 ftv phat
+--------------------------------------------------------------+| id low age lwt _Irace_2 _Irace_3 ftv phat ||--------------------------------------------------------------|
1. | 4 1 28 120 0 1 0 .3434579 |2. | 10 1 29 130 0 0 2 .2065388 |3. | 11 1 34 187 1 0 0 .2360498 |4. | 13 1 25 105 0 1 0 .4102857 |5. | 15 1 25 85 0 1 0 .4805368 |
|--------------------------------------------------------------|...
186. | 223 0 35 170 0 0 1 .1182268 |187. | 224 0 19 120 0 0 0 .2959572 |188. | 225 0 24 116 0 0 1 .2732751 |189. | 226 0 45 123 0 0 1 .1710699 |
+--------------------------------------------------------------+
ftvIraceIracelwtagee
ftvIraceIracelwtagee
p543210
543210
3__2__1
3__2__ˆ
การ Fit Model ในการวเคราะห Logistic Regression
-คานวณคา coefficient ดวยวธ Maximum Likelihood
คนควา /ศกษา
Generalized Linear Model:
- Random component or Family: binomial
- Link Function : logit ดงนน
- Systematic component : x1, x
2,… x
p โมเดลเชงเสนเขยนไดเปน
p
p
μ
μg
1ln
1ln)(
pp xxxp
pxy
...ˆ1
ˆln)(ˆ 2210
การทดสอบระดบนยสาคญของ Model
-ใชสถต likelihood ratio test (G ) ระหวางโมเดลทมเฉพาะ
constant กบ fitted Model
-นยสาคญของตวแปรแตละตว ดวย Wald Test
variablethewithlikelihood
variablethewithoutlikelihood2lnG
)(
ˆ
se
Z jj
การทดสอบระดบนยสาคญของ Model
- ใชสถต likelihood ratio test (G ) ระหวางโมเดลทมเฉพาะ
constant กบ fitted Model ดงน
. xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
G = -2[(-117.336)-(-111.286))] =12.099
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
แสดงวา มตวแปรอยางนอย 1 ตวแปรมคาสมประสทธ
แตกตางจาก 0
3
การทดสอบระดบนยสาคญของ Model
-ใชสถต likelihood ratio test (G ) ระหวางโมเดล เชน
Model 1
Model 2
)()()()(1
ln 43210 othersB raceracelwtagep
p
. use "H:\Hosmer_logistic\alr_data_Hosmer\logistic\lwt_2556.dta", clear
. xi: logit low age lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336…Iteration 3: log likelihood = -111.33032Logistic regression Number of obs = 189
LR chi2(4) = 12.01Prob > chi2 = 0.0173
Log likelihood = -111.33032 Pseudo R2 = 0.0512------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | -.0255238 .033252 -0.77 0.443 -.0906966 .039649lwt | -.0143532 .0065228 -2.20 0.028 -.0271377 -.0015688
_Irace_2 | 1.003822 .4980135 2.02 0.044 .0277335 1.97991_Irace_3 | .4434608 .3602569 1.23 0.218 -.2626298 1.149551
_cons | 1.306741 1.069782 1.22 0.222 -.7899926 3.403475------------------------------------------------------------------------------. est store m1
)()()()()(1
ln 543210 ftvraceracelwtagep
pothersB
. xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645Logistic regression Number of obs = 189
LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------. est store m2. lrtest m1 m2Likelihood-ratio test LR chi2(1) = 0.09(Assumption: m1 nested in m2) Prob > chi2 = 0.7671
. di -2*((-111.33032)-(-111.28645))
.08774
. di chiprob(1,.08774)
.76707018
G = -2ln(likelihood without the variable-likelihood with the variable)
การมนยสาคญของตวแปรแตละตวดวย Wald Test
)(
ˆ
seZ jj
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251
_Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085
ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347
------------------------------------------------------------------------------
Confidence Interval Estimation
-Estimate confidence of coefficient
)ˆ(ˆ)1(100 2/ seZof%CI i
xi: logit low lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)…
Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097
Log likelihood = -111.62955 Pseudo R2 = 0.0486
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023
_Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967
_cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------
p
ijiji
p
i
p
iii
p
iii voCxxraVxxraV
0 11
2
0
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
การประมาณคาความนาจะเปนรายขอมลและชวงเชอมน
Individual Predicted probability & Confidence Interval
Estimation
-Estimate Variance of logit
)ˆ()(ˆ)(5)1(1000
2/
p
iiii xseZxpxpof%CI
ตวอยาง การคานวณความแปรปรวน เมอ lwt=150 race=White
)]ˆ,ˆ()][()][([2
)]ˆ,ˆ()][()[(2
)]ˆ,ˆ()][()[(2)]ˆ,ˆ()][([2
)]ˆ,ˆ()][([2)]ˆ,ˆ()[(2
)]ˆ(][)([)]ˆ(][)([
)]ˆ()[()ˆ(ˆ)],150(ˆ[ˆ
32
31
2130
2010
32
22
12
0
Covblackraceblackrace
Covotherracelwt
CovblackracelwtCovotherrace
CovblackraceCovlwt
VarotherraceVarblackrace
VarlwtraVwhiteracelwtyraV
p
ijiji
p
i
p
iii
p
iii voCxxraVxxraV
0 11
2
0
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
. di .71429959 + (150^2)*(.00004146) + (0^2)*(.23819397) + (0^2)*(.12721584) + 2*150*(-.00521365) + 2*0*(.02260223) + 2*0*( -.1034968) + 2*0*(-.00064703) + 2*0*(.00035585) + 2*0*0*(.05320001)
.08305459
4
xi: logit low lwt i.race, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097
Log likelihood = -111.62955 Pseudo R2 = 0.0486
------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023
_Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967
_cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------
. vce
Covariance matrix of coefficients of logit model
e(V) | lwt _Irace_2 _Irace_3 _cons -------------+------------------------------------------------
lwt | .00004146_Irace_2 | -.00064703 .23819397_Irace_3 | .00035585 .05320001 .12721584
_cons | -.00521365 .02260223 -.1034968 .71429959
. di (-.0152231*150)+(1.081066*0)+(.4806033*0) + .8057535-1.4777115
. di exp(-1.4777115)/(1+exp(-1.4777115))
.18577333
. prvalue, x(lwt=150 _Irace_2=0 _Irace_3=0)logit: Predictions for lowConfidence intervals by delta method
95% Conf. IntervalPr(y=1|x): 0.1858 [ 0.1003, 0.2713]Pr(y=0|x): 0.8142 [ 0.7287, 0.8997]
lwt _Irace_2 _Irace_3x= 150 0 0
pp
pp
XXXαe
XXXαe
p
...1
...ˆ
2211
2211
ftvIraceIracelwtageαe
ftvIraceIracelwtageαe
p54321
54321
3__2__1
3__2__ˆ
ความนาจะเปนในการเกดทารกนาหนกนอยกวากาหนดเมอ lwt=150, ผวขาว
Confidence Interval Estimation
-Estimate confidence of p
)pse(ZpittrueCI α/i ˆˆlog)%1(100 2
)ˆ(ˆ 2/
)ˆ(2/ˆ
1)%1(100 pseZp
e
e
epofCI
pseZp
. do "I:\cat2011\95ci_p_logit.do"
. di (exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459)))))
.11480659
. di (exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459)))))
.28641379
Interpretation of the fitted model: odds ratio
- ตวแปร Dichotomous - ม 2 ระดบหรอ 2 กลม
Two independent variablesx1 code 0,1 ,and Fixed Value of x2; or Adjusted x2
22
22
22
221
221
221
221
221
221
221
1
1],0|0Pr[1],0|0[
,11
],0|1[
,1
1],1|1Pr[1],1|0[
,11
],1|1[
2121
)0(
)0(
21
)1(2121
)1(
)1(
21
x
x
x
x
x
x
x
x
x
x
exxyxxyP
e
e
e
exxyP
exxyxxyP
e
e
e
exxyP
1221221
22221
22
221
ee
eee
e
bc
ador
xx
xxx
x
dxxyxxyP
cxxyP
bxxyxxyP
axxyP
],0|0Pr[1],0|0[
,],0|1[
,],1|1Pr[1],1|0[
,],1|1[
2121
21
2121
21
221
221
1 x
x
e
e
221 )1(1
1xe
22
22
1 x
x
e
e
221
1xe
a bc d
a
b
c
d
ตวอยาง ในการวเคราะห multiple logistic regressionsmoke, age ตองการแปลผล odds ratio ตวแปร smoke โดย Adjusted age
age
age
age
age
age
age
age
age
age
age
e
agesmokelowagesmokelowPe
e
e
eagesmokelowP
exxyagesmokelowP
e
e
e
eagesmokelowP
2
2
2
21
21
21
21
21
21
21
1
1
],0|0Pr[1],0|0[11
],0|1[
1
1],1|1Pr[1],1|0[
11],1|1[
)0(
)0(
)1(21
)1(
)1(
5
12121
221
2
21
ee
eee
e
bc
ador
ageage
ageageage
age
dagesmokelowagesmokelowP
cagesmokelowP
bagesmokelowagesmokelowP
aagesmokelowP
],0|0Pr[1],0|0[
,],0|1[
,],1|1Pr[1],1|0[
,],1|1[
age
age
e
e21
21
1
agee 21 )1(1
1
age
age
e
e2
2
1
agee 21
1
a bc d
ดงนนการคานวณ odds ratio ในสมการ logistic regression
-เรยกวา Adjusted odds ratio
ตวอยาง เมอให smoke=1 เปนตวแปรทตองการศกษา
- ตวแปร age เปนตวแปรควบคม- ตวแปร age มคาเทากน ในแตละกลมทศกษา
ORadjustediβe
iOR
การคานวณ odds ratio จากสมการ logistic regression
-วดระดบความสมพนธ
-คาทได เปนคาทควบคมผลจากตวแปรทกตวเรยกวา
Adjusted odds ratio
ตวแปรตาม DExposure (E)
Control (C)
Control (C) Control...
ความหมาย odds ratio จากสมการ logistic regression
-เมอควบคมผลจากปจจย Ci การสมผสปจจย E มโอกาส
เกดเหตการณ D เปน OR เทาของการไมไดสมผสปจจย E
. logit low smoke age, or
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -113.66733Iteration 2: log likelihood = -113.63815Iteration 3: log likelihood = -113.63815
Logistic regression Number of obs = 189LR chi2(2) = 7.40Prob > chi2 = 0.0248
Log likelihood = -113.63815 Pseudo R2 = 0.0315
------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------smoke | 1.997405 .642777 2.15 0.032 1.063027 3.753081
age | .9514394 .0304194 -1.56 0.119 .8936481 1.012968------------------------------------------------------------------------------
ความหมาย odds ratio จากสมการ logistic regression
-เมอควบคมอาย การสบบหร มโอกาสพบทารกนาหนก
ตวนอย เปน 1.997 เทาของการไมสบบหร
ตวแปร polychotomous
-ตวแปรอสระทมระดบหรอจานวนกลม > 2 กลม
-สรางตวแปรหน (dummy variables) k-1 ตวแปร
ตวอยาง กรณมตวแปรม k ระดบ สรางตวแปรหน
ไดเทากบ k-1 ตวแปร (k=ระดบ, กลม)
level/ ตวแปรหน (dummy variable)
group code D1 D2
code=1 0 0
code=2 1 0
code=3 0 1
Reference Cell
การเปรยบเทยบ code=2 VS code=1, code=3 VS code=1
three independent variables- x1 code 0,1 ,2 - and Fixed Value of x2 ,x3; or Adjusted x2,x3
,1
1
],,1|1[1],,1|0[
,1
1],,1|1[
33221
33221
33221
33221
33221
)1(
321321
)1(
)1(
321
xx
xx
xx
xx
xx
e
xxxyPxxxyPe
e
e
exxxyP
= a
= b
6
3322
3322
3322
33221
33221
1
1
],,0|0[1],,0|0[
,1
1],,0|1[
321321
)0(
)0(
321
xx
xx
xx
xx
xx
e
xxxyPxxxyPe
e
e
exxxyP
= c
= d
33221
33221
1 xx
xx
e
e
33221 )1(1
1xxe
3322
3322
1 xx
xx
e
e
33221
1xxe
a bc d
13322133221
332233221
22
33221
ee
eee
e
bc
ador
xxxx
xxxxx
xx
,1
1
],3__,,,1_|1[
54321
54321
54321
54321
3__
3__
3__)1(
3__)1(
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
e
e
e
e
ftvIracelwtageIraceyP
= a
ตวอยาง ในการวเคราะห multiple logistic regressionage, lwt, i.rece (_Irace_2) , ftv ; ตองการแปลผล odds ratio _Irace_2 แสดงวา Adjusted age, lwt, i.rece (_Irace_3) , ftv ,
1
1
],3__,,,12__|1[1
],3__,,,12__|0[
54321 3__ ftvIracelwtagee
ftvIracelwtageIraceyP
ftvIracelwtagelraceyP
,1
1
],3__,,,0_|1[
5432
5432
54321
54321
3__
3__
3__)0(
3__)0(
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
ftvIracelwtage
e
e
e
e
ftvIracelwtageIraceyP
= b
= c
,1
1
],3__,,,12__|1[1
],3__,,,02__|0[
5432 3__ ftvIracelwtagee
ftvIracelwtageIraceyP
ftvIracelwtagelraceyP
= d
ftvIracelwtage
ftvIracelwtage
e
e54321
54321
3__
3__
1
ftvIracelwtagee 54321 3__1
1
ftvIracelwtage
ftvIracelwtage
e
e5432
5432
3__
3__
1
ftvIracelwtagee 5432 3__1
1
a bc d
1
5432
54321
3__
3__
e
e
e
bc
ador
ftvIracelwtage
ftvIracelwtage
7
ดงนนการคานวณ odds ratio ในสมการ logistic regression
-เรยกวา Adjusted odds ratio
ตวอยาง เมอให _Irace_2 (ผวดา) เปนตวแปรทตองการศกษา
- ตวแปร AGE, lwt, _Irace_3 (ผวอนๆ) ตวแปร age, lwt,_Irace_3 (ผวอนๆ), ftv เปนตวแปรควบคม
- ตวแปร age, lwt, _Irace_3 (ผวอนๆ), ftv มคาเทากน
ในแตละกลมทศกษา
ORadjustediβe
iOR
. xi: logit low age lwt i.race ftv,ori.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645
Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335
Log likelihood = -111.28645 Pseudo R2 = 0.0516
------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .9764586 .0329355 -0.71 0.480 .9139936 1.043193lwt | .9858564 .0064482 -2.18 0.029 .9732989 .9985759
_Irace_2 | 2.728898 1.358603 2.02 0.044 1.028513 7.240436_Irace_3 | 1.542043 .5585894 1.20 0.232 .7581543 3.13643
ftv | .9518876 .1591923 -0.29 0.768 .6858544 1.321111------------------------------------------------------------------------------
การแปลความหมาย odds ratio: กรณตวแปรตอเนอง
-การเปลยนแปลง 1 หนวย is not clinically interesting
เชนอายเพมขน 1 ป หรอความดนโลหตเพมขน 1 mm.Hg
-การเปลยนแปลงควรเปน 5 , 10,…
-หรอตรงกนขาม x มคา 0-1 หนวย การเปลยนแปลง 1 หนวย
เปนคามากไป การเพม 0.01 อาจมความเหมาะสมกวา
-วธการคานวณ odds ratio กรณตวแปรตอเนองดงน
1ˆ
)(βc
ecOR )]ˆ([
1ˆ[
)(%952/ secZβc
ecORofCI
การแปลความหมาย odds ratio: Change in Odds or Percent
. listcoeflogit (N=189): Factor Change in Odds Odds of: 1 vs 0
----------------------------------------------------------------------low | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 0.9765 0.8814 5.2987lwt | -0.01424 -2.178 0.029 0.9859 0.6469 30.5794
_Irace_2 | 1.00390 2.016 0.044 2.7289 1.4144 0.3454_Irace_3 | 0.43311 1.196 0.232 1.5420 1.2309 0.4796
ftv | -0.04931 -0.295 0.768 0.9519 0.9491 1.0593----------------------------------------------------------------------
. listcoef, percentlogit (N=189): Percentage Change in Odds Odds of: 1 vs 0
----------------------------------------------------------------------low | b z P>|z| % %StdX SDofX
-------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 -2.4 -11.9 5.2987lwt | -0.01424 -2.178 0.029 -1.4 -35.3 30.5794
_Irace_2 | 1.00390 2.016 0.044 172.9 41.4 0.3454_Irace_3 | 0.43311 1.196 0.232 54.2 23.1 0.4796
ftv | -0.04931 -0.295 0.768 -4.8 -5.1 1.0593----------------------------------------------------------------------
Modeling Strategy: Two goals of mathematical modeling
(1) To obtain a valid estimate of an explanatory variables
and response variable relationship
(2) To obtain a good predictive model
Different strategies for difference goals
-Prediction goal -> use computer algorithms forward selection,
backward elimination, stepwise, all possible
-Validity goals -> for etiologic research, standard computer
algorithms do not appropriate because the roles that
variables - such as confounder & effect modifiers
Modeling Building Strategies Guidelines
Variable Selection “Most parsimonious model”
-minimizing the number of variables in the model
-Model is more likely to be numerically stable
-More easily generalized
ขนตอนในการคดเลอกแบบเจาะจงสาหรบสาหรบการวเคราะห
logistic regression model
* เรยกวา การคดเลอกแบบเจาะจง “purposeful selection”(Hosmer & Lameshow,2000, 2013)
1. A careful univariable analysis of each independent variable
2. Fit the multivariable model containing all covariates identified for
inclusion in step 1
3. fit of the smaller model (reduced model) compare the values of the
estimated coefficients in the smaller model to their respective values
from the larger model. concerned about any variable whose
coefficient has changed markedly in magnitude, > 20%,
-Any variable whose coefficient has changed markedly in magnitude should
be added back into the model as it is important in the sense of providing a
needed adjustment of the effect of the variables that remain in the model.
%
8
ขนตอนในการคดเลอกแบบเจาะจงสาหรบสาหรบการวเคราะห
logistic regression model
-Cycle through steps 2 and 3 until it appears that all of the important
variables are included in the model and those excluded are clinically and/or
statistically unimportant.
-Hosmer et al. use the "delta-beta-hat-percent" as a measure of the change
in magnitude of the coefficients. suggest a significant change >20%
= the coefficient from the smaller model and
= the coefficient from the larger model.
- ตวแปรใดทมคา p value > 0.25 จะนาออกจากโมเดล
- อยางไรกตามตวแปรทมคา p value > 0.25 แตยงคงไวในโมเดล
เชน พบวาเปนปจจยควบคมทสาคญหรอมเหตผลอนๆ ทจาเปนตองคง
ตวแปรนนไว
full
reducei
ˆ
ˆ100
ˆ)ˆˆ(
%ˆ xFull
Fullreduce
ขนตอนในการคดเลอกแบบเจาะจงสาหรบสาหรบการวเคราะห
logistic regression model
4. Add each variable not selected in Step 1 to the model obtained at the
conclusion of cycling through Step 2 and Step 3, one at a time, and check
its significance either by the Wald statistic p-value or the partial likelihood
ratio test,
if it is a categorical variable with more than two levels. This step is
vital for identifying variables that, by themselves, are not significantly
related to the outcome but make an important contribution in the presence
of other variables. We refer to the model at the end of Step 4 as the
preliminary main effects model.
ขนตอนในการคดเลอกแบบเจาะจงสาหรบสาหรบการวเคราะห
logistic regression model
5. examine more closely the variables in the model. The question
of the appropriate categories for categorical variables should have
been addressed during the univariable analysis in Step 1. For each
continuous variable in this model we must check the assumption
that the logit increases/decreases linearly as a function of the
covariate. the model at the end of Step 5 as the main effects
model.
6. Have the main effects model, Check for interactions among the
variables in the model.
7. Assess its adequacy and check its fit, Before any model
becomes the final model
ตวอยาง การวเคราะหขอมล University of Massachusetts Aids
Research Unit (UMARU) Impact Study (UIS)
id Id number age Age at Enrollment beck Beck Depression Score ivhx IV Drug Use History (1=never
2=previous 3=recent) ndrugtx Number of Prior Drug Txrace Subject’s Race
(0=white 1=other) treat Tx Randomization
(0=short 1=long) site Tx Site (0=A,1=B) dfree Returned to Drug Use
(1=remained 0=otherwise)
1. A careful univariable analysis of each independent variable
- Univariable logistric regression (y=0,1) กบตวแปรอสระ
ทกตวแปร
- ตวแปร nominal , ordinal Scale วเคราะหดวย univariable
logistic regression พจารณาคาสถต Wald test, likelihood ratio
หรอ วเคราะหตารางการณจรดวยสถต likelihood ratio
Chi-Square, Pearson Chi-Square
- ตวแปร continuous วเคราะหดวย univariable logistic regression
พจารณาคาสถต Wald test, likelihood ratio test หรอวเคราะห
ดวยสถต t-test
- มความสาคญ (clinically biological meaningful) /มเหตผล
- Univariable analysis : crude analysis พบวา p-value <.25
(Hosmer & Lemeshow 2000: p.95)
. logit dfree
Iteration 0: log likelihood = -326.86446Logistic regression Number of obs = 575
LR chi2(0) = -0.00Prob > chi2 = .
Log likelihood = -326.86446 Pseudo R2 = -0.0000------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_cons | -1.068691 .095599 -11.18 0.000 -1.256061 -.88132------------------------------------------------------------------------------
. estimates store A
. logit dfree age Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -326.16602Iteration 2: log likelihood = -326.16544Logistic regression Number of obs = 575
LR chi2(1) = 1.40Prob > chi2 = 0.2371
Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0181723 .015344 1.18 0.236 -.0119014 .048246_cons | -1.660226 .5110844 -3.25 0.001 -2.661933 -.6585194
------------------------------------------------------------------------------
9
. logit dfree age, or
...Logistic regression Number of obs = 575
LR chi2(1) = 1.40Prob > chi2 = 0.2371
Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | 1.018338 .0156254 1.18 0.236 .9881691 1.049429------------------------------------------------------------------------------
. estimates store B
. lrtest A B Likelihood-ratio test LR chi2(1) = 1.40(Assumption: A nested in B) Prob > chi2 = 0.2371
. lincom 10*age,or
( 1) 10 age = 0------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 1.199282 .184018 1.18 0.236 .887795 1.620055------------------------------------------------------------------------------
*** odds ratio for a 10 point increase in BECK
. logit dfree beck…Logistic regression Number of obs = 575
LR chi2(1) = 0.64Prob > chi2 = 0.4250
Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
beck | -.008225 .0103428 -0.80 0.426 -.0284965 .0120464_cons | -.9272829 .2003166 -4.63 0.000 -1.319896 -.5346696
------------------------------------------------------------------------------
. estimates store C
. lrtest A C
Likelihood-ratio test LR chi2(1) = 0.64(Assumption: A nested in C) Prob > chi2 = 0.4250
. logit dfree beck, or…Logistic regression Number of obs = 575
LR chi2(1) = 0.64Prob > chi2 = 0.4250
Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
beck | .9918087 .010258 -0.80 0.426 .9719057 1.012119------------------------------------------------------------------------------
. lincom 5*beck,or
( 1) 5 beck = 0
------------------------------------------------------------------------------dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------(1) | .959709 .0496302 -0.80 0.426 .8672027 1.062083
------------------------------------------------------------------------------5555
*** odds ratio for a 5 point increase in BECK
. logit dfree ndrugtx…Logistic regression Number of obs = 575
LR chi2(1) = 11.84Prob > chi2 = 0.0006
Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
ndrugtx | -.0749582 .024681 -3.04 0.002 -.123332 -.0265844_cons | -.7677805 .130326 -5.89 0.000 -1.023215 -.5123462
------------------------------------------------------------------------------
. logit dfree ndrugtx, or
...Logistic regression Number of obs = 575
LR chi2(1) = 11.84Prob > chi2 = 0.0006
Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
ndrugtx | .9277822 .0228986 -3.04 0.002 .8839701 .9737658------------------------------------------------------------------------------. estimates store D. lrtest A DLikelihood-ratio test LR chi2(1) = 11.84(Assumption: A nested in D) Prob > chi2 = 0.0006
. xi:logit dfree i.ivhxi.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)…Logistic regression Number of obs = 575
LR chi2(2) = 13.35Prob > chi2 = 0.0013
Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_Iivhx_2 | -.4810199 .2657063 -1.81 0.070 -1.001795 .0397548_Iivhx_3 | -.7748382 .2165765 -3.58 0.000 -1.19932 -.3503561
_cons | -.6797242 .1417395 -4.80 0.000 -.9575285 -.4019198------------------------------------------------------------------------------
. xi:logit dfree i.ivhx, ori.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575
LR chi2(2) = 13.35Prob > chi2 = 0.0013
Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
_Iivhx_2 | .6181526 .164247 -1.81 0.070 .3672198 1.040556_Iivhx_3 | .4607783 .0997937 -3.58 0.000 .301399 .7044372
------------------------------------------------------------------------------. estimates store E. lrtest A ELikelihood-ratio test LR chi2(2) = 13.35(Assumption: A nested in E) Prob > chi2 = 0.0013
. logit dfree race
...Logistic regression Number of obs = 575
LR chi2(1) = 4.62Prob > chi2 = 0.0315
Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
race | .4591026 .2109763 2.18 0.030 .0455967 .8726085_cons | -1.193922 .1141504 -10.46 0.000 -1.417653 -.9701919
------------------------------------------------------------------------------
. logit dfree race, or
...Logistic regression Number of obs = 575
LR chi2(1) = 4.62Prob > chi2 = 0.0315
Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
race | 1.582653 .3339022 2.18 0.030 1.046652 2.393145------------------------------------------------------------------------------
. estimates store F
. lrtest A F
Likelihood-ratio test LR chi2(1) = 4.62(Assumption: A nested in F) Prob > chi2 = 0.0315
10
. logit dfree treat
...Logistic regression Number of obs = 575
LR chi2(1) = 5.18Prob > chi2 = 0.0229
Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
treat | .437162 .1930633 2.26 0.024 .0587649 .8155591_cons | -1.297816 .143296 -9.06 0.000 -1.578671 -1.016961
------------------------------------------------------------------------------
. logit dfree treat, or
...Logistic regression Number of obs = 575
LR chi2(1) = 5.18Prob > chi2 = 0.0229
Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
treat | 1.548307 .2989212 2.26 0.024 1.060526 2.260439------------------------------------------------------------------------------
. estimates store G
. lrtest A G
Likelihood-ratio test LR chi2(1) = 5.18(Assumption: A nested in G) Prob > chi2 = 0.0229
. logit dfree site
...Logistic regression Number of obs = 575
LR chi2(1) = 1.67Prob > chi2 = 0.1968
Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
site | .2642236 .2034167 1.30 0.194 -.1344658 .662913_cons | -1.15268 .1170732 -9.85 0.000 -1.382139 -.9232202
------------------------------------------------------------------------------
. logit dfree site, or
...Logistic regression Number of obs = 575
LR chi2(1) = 1.67Prob > chi2 = 0.1968
Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------
dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
site | 1.302419 .2649338 1.30 0.194 .8741828 1.940437------------------------------------------------------------------------------
. estimates store H
. lrtest A H
Likelihood-ratio test LR chi2(1) = 1.67(Assumption: A nested in H) Prob > chi2 = 0.1968
ตาราง การวเคราะห simple logistic regression
0.1971.670.87, 1.941.3020.20340.264site
0.02295.181.06, 2.261.548 0.1931 0.437 treat
0.03154.621.05, 2.391.583 0.2109 0.459 race
0.30, 0.700.460 0.2166 -0.775 ivhx3
0.0013 13.350.37, 1.040.618 0.2657 -0.481 ivhx2
0.000611.840.88, 0.970.9280.0247-0.075ndrgtx
0.42500.640.97, 1.010.9920.0103-0.008beck
0.23711.400.99, 1.051.0180.0153 0.018 age
p valuelikelihood ratio95%CIorseสมประสทธ ตวแปร
ตวแปร beck ม p value เทากบ 0.426 ดงนนจะตดตวแปร beck
ออกจากการวเคราะห
- ตวแปรใดทมคา p value > 0.25 จะนาออกจากโมเดล
แตใหพจารณา ตวแปรมอทธพลกบตวแปรอน เพยงใด
หรอเปนวธหนงของการพจารณา “ตวแปรกวน”
พจารณาจาก คาสมประสทธทเปลยนไป จากคา
"delta-beta-hat-percent"
- อยางไรกตามตวแปรทจากมคา p value > 0.25
แตยงคงไวในโมเดล เชน พบวาเปนปจจยควบคมทสาคญ
หรอมเหตผลอนๆ ทจาเปนตองคงตวแปรนนไว
2. Fit of the multivariable model
100ˆ
)ˆˆ(%ˆ x
Full
Fullreduce
. use "K:\hosmer_data\logistic\uis.dta", clear
. xi:logit dfree age ndrugtx i.ivhx race treat sitei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -310.17928Iteration 2: log likelihood = -309.62871Iteration 3: log likelihood = -309.62413Iteration 4: log likelihood = -309.62413Logistic regression Number of obs = 575
LR chi2(7) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.62413 Pseudo R2 = 0.0527------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0503708 .0173224 2.91 0.004 .0164196 .084322ndrugtx | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761
_Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662
race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125
_cons | -2.405405 .5548058 -4.34 0.000 -3.492805 -1.318006------------------------------------------------------------------------------
- พจารณาคา p value จากสถต Ward ของตวแปรทกๆ ตวแปร
- พบวาตวแปร race ม p value เทากบ 0.311
- ตวแปร site ม p value เทากบ 0.494
- เนองจากตวแปร race เปนตวแปรทจากการศกษาพบวาเปน
ปจจยตองควบคมทสาคญ และตวแปร site เปนตวแปรสม
ของพนททศกษา ถงแมวา p value > 0.25 จะยงคงตวแปร
ทงสองไวในโมเดล
* การพจารณาคา p value ในทนพจารณาจากสถต ward กรณท
ขอมลในแตละกลมตวแปรตามและจานวนตวแปรในโมเดล
ไมเหมาะสม สถตทแนะนาใหใชไดแก likelihood ratio
11
- กรณทตองการดตวแปร ทตดออกไปนนมอทธพล (confound) ตอ
ตวแปรอน มากนอยเพยงใด
- การพจารณาคาสมประสทธทเปลยนไป
- Hosmer et al. เรยกวา "delta-beta-hat-percent" ,
suggest a significant change >20%.
- purposeful selection ใช "delta-beta-hat-percent"
100ˆ
ˆˆ%)ˆ(
mod
modmod xtCoefficieninChangeelfull
elfullelreduce
100)(mod
modmod xEE
EEEEEEratiooddsinChange
elfull
elfullelreduce
*Kleinbaum, Kupper, Morgenstern (1982) ; Greenland,1989;
Mickey & Greenland, 1989) แนะนาใหใช change in effect
estimates เชน odds ratio คาทเปลยนแปลง 10% มแนวโนมวา
ตวแปรนนมอทธพลกบตวแปร main effect
. xi:logit dfree age ndrugtx i.ivhx race treati.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575
LR chi2(6) = 34.02Prob > chi2 = 0.0000
Log likelihood = -309.8567 Pseudo R2 = 0.0520------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0509605 .017309 2.94 0.003 .0170354 .0848856ndrugtx | -.0631998 .0256525 -2.46 0.014 -.1134778 -.0129219
_Iivhx_2 | -.5928725 .2864333 -2.07 0.038 -1.154272 -.0314735_Iivhx_3 | -.7600441 .2489941 -3.05 0.002 -1.248064 -.2720245
race | .2081089 .221453 0.94 0.347 -.2259309 .6421488treat | .438959 .1991429 2.20 0.028 .0486461 .829272_cons | -2.355786 .5501049 -4.28 0.000 -3.433972 -1.2776
------------------------------------------------------------------------------
- ในทน เปนตวอยางนา site ออกไป
- (เปนตวอยางการคานวณ เทานนเนองจาก site เปนตวแปรสาคญ)
-0.800920.438960.44250treat
--0.14859site
-7.969150.208110.22613race
3.72885-0.76004-0.73272_Iivhx_3
-1.73323-0.59287-0.60333_Iivhx_2
2.74369-0.06320-0.06151ndrugtx
1.170720.050960.05037age
Delta beta hat (%)Reduce modelFull modelVariables
100ˆ
ˆˆ)ˆ(%
mod
modmod xhatBetaDeltaelfull
elfullelreduce
- เมอ <20% สามารถ remove ตวแปรนนออกได%. xi:logit dfree age ndrugtx i.ivhx race treat site becki.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -310.17972Iteration 2: log likelihood = -309.62533Iteration 3: log likelihood = -309.6238Iteration 4: log likelihood = -309.6238Logistic regression Number of obs = 575
LR chi2(8) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.6238 Pseudo R2 = 0.0527------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0504143 .0174058 2.90 0.004 .0162995 .084529ndrugtx | -.0615329 .0256457 -2.40 0.016 -.1117975 -.0112682
_Iivhx_2 | -.6036962 .2875987 -2.10 0.036 -1.167379 -.0400131_Iivhx_3 | -.7336591 .2549904 -2.88 0.004 -1.233431 -.2338871
race | .2260262 .2233692 1.01 0.312 -.2117694 .6638218treat | .4424802 .1992933 2.22 0.026 .0518725 .833088site | .1489209 .2176073 0.68 0.494 -.2775816 .5754234beck | .0002759 .0107983 0.03 0.980 -.0208883 .0214402
_cons | -2.411128 .5983465 -4.03 0.000 -3.583866 -1.238391------------------------------------------------------------------------------
4: Add each variable not selected in Step 1 to the model
ตรวจสอบ linearity logit กบตวแปร continuous
วธตรวจสอบ
Smoothed scatter plots
Plot Smoothed logit and continuous variable
design variables
Plot Coefficient and continuous variable โดยแบงตวแปร
continuous variable เปน 4 สวนดวย quartile
Fractional polynomials
Spline funcion
5. examine more closely the variables in the model. do "G:\hosmer_data\logistic\plot_smooth_logit_age.do". lowess dfree age, gen(var3) logit nodraw. graph twoway line var3 age, sort xlabel(20(10)50 56)
-Plot Smoothed logit and continuous variable
12
-Plot Coefficient and continuous variable โดยแบงตวแปร continuous variable เปน 4 สวนดวย quartile
.xtile age1 = age, nq(4)
.tabstat age, statistics(median ) by(age1) columns(variables)
Summary statistics: p50by categories of: age1 (4 quantiles of age)
age1 | age---------+----------
1 | 252 | 303 | 354 | 40
---------+----------Total | 32
--------------------
. xi: logit dfree i.age1 ndrugtx i.ivhx race treat site
...Logistic regression Number of obs = 575
LR chi2(9) = 34.69Prob > chi2 = 0.0001
Log likelihood = -309.52103 Pseudo R2 = 0.0531
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------_Iage1_2 | -.165864 .2909137 -0.57 0.569 -.7360444 .4043163_Iage1_3 | .4693399 .27066 1.73 0.083 -.0611439 .9998237_Iage1_4 | .595771 .3124964 1.91 0.057 -.0167108 1.208253ndrugtx | -.0587551 .0254688 -2.31 0.021 -.108673 -.0088371
_Iivhx_2 | -.5545193 .2853626 -1.94 0.052 -1.11382 .0047811_Iivhx_3 | -.6725536 .2518601 -2.67 0.008 -1.16619 -.1789169
race | .2787172 .2238499 1.25 0.213 -.1600205 .7174549treat | .4430577 .2000427 2.21 0.027 .0509812 .8351343site | .1582001 .2188293 0.72 0.470 -.2706974 .5870976
_cons | -1.054837 .2705875 -3.90 0.000 -1.585179 -.5244956------------------------------------------------------------------------------
.clear
.input age coefage coef
1. 25 02. 30 -.1658643. 35 .46933994. 40 .5957715. end
.graph twoway scatter coef age, connect(l) ylabel(-.25(.25).75) xlabel(20(10)50) yline(0)
การวเคราะห fractional polynomial
-การสรางโมเดลโดยวธ Fractional Polynomial เปนการสราง
โมเดลระหวางตวแปรผล (outcome) และตวแปรอสระทม
สเกลการวดแบบตอเนองหรอสเกลแบบจดอนดบ นาเสนอโดย
Royston & Altman (1994)
-เมอตวแปรไม linearity หรอไมมความสมพนธเชงเสน
กลาวอกนยหนงคอโมเดลทมความสมพนธแบบไมใชเสนตรง
(non-linearity) ใหปรบเปลยนตวแปรนนดวยคายกกาลง
(power) ใดๆ
-โดยมชอเรยกเชน การสรางสมการแบบ first-order
fractional polynomial หรอ fp1 etc.
การวเคราะห fractional polynomial
-การแปลงคาของ x ใดๆ เปนคา xp ตามทเหมาะสมจาก
คายกกาลง p ดงตอไปน -2, -1, -0.5, 0, 0.5, 1, 2, 3
-เมอ p=0 คา xp คอคาของ log x ดงนนการปรบเปลยนใน
กลมนมไดทงหมด 8 รปแบบ
-การสรางสมการแบบ second-order fractional polynomial
หรอ fp2 เปนการแปลงคาของ x ใด เปนคา xp ตามทเหมาะสม
จากคายกกาลง p เปนคๆ การปรบเปลยนในกลมนมได
ทงหมด 72 รปแบบ
33100.5-1
320.500-1
2200-0.5-1
313-0.5-1-1
212-0.53-23
111-0.52-22
30.50.5-0.51-21
20.50-0.50.5-20.5
10.5-0.5-0.50-20
0.50.53-1-0.5-2-0.5
302-1-1-2-1
201-1-2-2-2
p2P1P2p1p2p1p
powerpowerpowerPower
FP2FP1
Power of First &
second-order
fractional polynomial
First order (FP1) p=8
Second order (FP2) p=72
13
วธการเปลยนรปตวแปรตอเนองโดยวธ Fractional Polynomial
โดยการสรางสเกล (Scaling) และหรอ การเปลยนรป
โดยการปรบจากคากลาง (center)
-การปรบเปลยนตวแปรตอเนองโดยวธ Fractional Polynomial
กรณทตวแปรตอเนองไมมลกษณะเชงเสน สามารถกาหนดไดหลายวธ
เชน
การเปลยนรป (transform) โดยการสรางสเกล (Scaling) และหรอ
การเปลยนรปโดยการปรบจากคากลาง (center)
วธ Fractional Polynomial โดยการสรางสเกล (Scaling)
-สามารถทาไดหลายวธ เชนการสรางสเกลโดยใชโปรแกรม STATA
มการกาหนดดงน
lrange = log10[max(x) - min(x)]
scale = 10sign(lrange)int(|lrange|)
x∗ = x/scale
วธ Fractional Polynomial โดยการปรบจากคากลาง (center)
-เชนการเปลยนรปตวแปร ใชสญลกษณ
-กรณเปลยนรปแบบ FP1 ดงนนจากโมเดล
เปลยนรปเปน เมอ
*1x
*1x1x
11ˆ xy oi
)*)(ˆ 1*1
*1
*0
ppi xxy
n
ix
nx
1 1
1*
การวเคราะห fractional polynomial
-การเลอกโมเดลใดๆ พจารณาจาก คาความแตกตางของ
Deviance ระหวางโมเดลทใชในการวเคราะห ดงน
-คาความแตกตางของ Deviance ประมาณไดกบการแจกแจงแบบ
Chi-Square ท df= df(model2)-df(model1)
)},()({2),(,( 211211 ppLpLpppG
)}()1({2),1( 11 pLLpG (df=1)
(df=2)
การวเคราะห fractional polynomial
-การเลอกโมเดลใดๆ พจารณาจาก คาความแตกตางของ
Deviance ระหวางโมเดลทใชในการวเคราะห ดงน
-คาความแตกตางของ Deviance ประมาณไดกบการแจกแจงแบบ
Chi-Square ท df = df(model 2)-df(model 1)
-อยางไรกตามการวเคราะหโดยใช fractional polynomial ทาให
การแปลผลยงยาก วธแกไขโดยการจดกลมตวแปร ตอเนองอยาง
เหมาะสม โดยศกษาจากทฤษฎ การศกษาวจย การใช cut point ดวย
Median, Quartile ตองพงระมดระวงสาหรบ การจดกลมกบตวแปร
ตอเนอง อาจใหเกดขอสรปทคาดเคลอนได
Heinzl H.,2000; Royston P., Altman D.G., Sauerbrei W., 2006)
Fractional polynomial. use "H:\hosmer_data\logistic\uis.dta", clear. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(2)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446…Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575
LR chi2(8) = 34.96Prob > chi2 = 0.0000
Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293
Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014
race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846
_cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.
1. เมอ power=1 หรอ age เปน linear เมอเปรยบเทยบ age อยในโมเดลกบ
ไมม age ในโมเดล (p-value=.003;df=1-0)
2. เมอเปรยบเทยบ age กบ (age-2 และ age3) พบวาไม significant
(Dev. dif.=619.248-618.769= 0.480; p-value=0.923, df=4-1)
3. เปรยบเทยบ age3 กบ (age
-2 และ age
3) พบวาไม significant
(Dev. dif.=618.882-618.769=0.133,p-value=0.945;df=4-2)
First order (FP1) Second order (FP2)
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
. di chiprob(4-1,619.248-618.769)
.9234802
. di chiprob(4-2,618.882-618.769)
.94506648
14
การพจารณาวาโมเดลใดๆ ดกวา linear model
ใน Fractional polynomial
G(1,(p1, p
2) = -2{L(1) - L(p
1, p
2)}
= 619.248 - 618.769 = 0.480; p-value = 0.923
เลอก linear model
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
G(1,p1) = -2{L(1) - L(p1)}=619.248-618.882=.366;p-value=.545
STATA10
First order m=1 (FP1) Second order m=2 (FP2)
. use "I:\hosmer_data\logistic\uis.dta", clear
. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(1)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)-> gen double Iage__1 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446...------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | .0138939 .0046486 2.99 0.003 .0047829 .023005Indru__1 | -.0620649 .0257325 -2.41 0.016 -.1124997 -.0116301_Iivhx_2 | -.5960999 .2868616 -2.08 0.038 -1.158338 -.0338615_Iivhx_3 | -.714141 .2499592 -2.86 0.004 -1.204052 -.22423
race | .2355037 .2230028 1.06 0.291 -.2015736 .6725811treat | .4348659 .1992503 2.18 0.029 .0443425 .8253893site | .1436801 .2173756 0.66 0.509 -.2823683 .5697285
_cons | -1.113293 .2236989 -4.98 0.000 -1.551734 -.6748509------------------------------------------------------------------------------Deviance: 618.88. Best powers of age among 8 models fit: 3.Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 8.918 0.012Linear 1 619.248 0.366 0.545 1m = 1 2 618.882 -- -- 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 1 model
. di chiprob(2-1,619.248-618.882)
.54519273
. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)
(where: X = age/10)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -309.95259Iteration 2: log likelihood = -309.38924Iteration 3: log likelihood = -309.38436Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575
LR chi2(8) = 34.96Prob > chi2 = 0.0000
Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293
Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014
race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846
_cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.
STATA10
Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
ตวแปร ndrugtx. lowess dfree ndrugtx , gen(var2) logit nodraw. graph twoway line var2 ndrugtx, sort xlabel(20(10)50 56)
. xi:fracpoly logit dfree ndrugtx age i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Iage__1 = age-32.3826087 if e(sample)........-> gen double Indru__1 = X^-1-1.804204581 if e(sample)-> gen double Indru__2 = X^-1*ln(X)+1.064696882 if e(sample)
(where: X = (ndrugtx+1)/10)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -307.22312Iteration 2: log likelihood = -306.72663Iteration 3: log likelihood = -306.72558Logistic regression Number of obs = 575
LR chi2(8) = 40.28Prob > chi2 = 0.0000
Log likelihood = -306.72558 Pseudo R2 = 0.0616------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
Indru__1 | .981453 .2888474 3.40 0.001 .4153226 1.547583Indru__2 | .3611251 .1098589 3.29 0.001 .1458057 .5764445Iage__1 | .0544455 .0174877 3.11 0.002 .0201703 .0887208
_Iivhx_2 | -.6088269 .2911064 -2.09 0.036 -1.179385 -.0382689_Iivhx_3 | -.7238122 .2555643 -2.83 0.005 -1.224709 -.2229154
race | .2477026 .2242152 1.10 0.269 -.1917512 .6871564treat | .4223666 .200365 2.11 0.035 .0296584 .8150748site | .1732142 .2209758 0.78 0.433 -.2598905 .6063189
_cons | -1.164471 .2454818 -4.74 0.000 -1.645607 -.6833356------------------------------------------------------------------------------Deviance: 613.45. Best powers of ndrugtx among 44 models fit: -1 -1.
ตวแปร ndrugtx
15
Fractional polynomial model comparisons:---------------------------------------------------------------ndrugtx df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 626.176 12.725 0.013Linear 1 619.248 5.797 0.122 1m = 1 2 618.818 5.367 0.068 .5m = 2 4 613.451 -- -- -1 -1---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model
Fractional polynomial model comparisons:---------------------------------------------------------------ndrugtx df Deviance Gain P(term) Powers---------------------------------------------------------------Not in model 0 626.176 -- --Linear 1 619.248 0.000 0.008 1m = 1 2 618.818 0.430 0.512 .5m = 2 4 613.451 5.797 0.068 -1 -1---------------------------------------------------------------
G(1,(p1, p2)) = -2{L(1) - L(p1, p2)}
G = 619.248 - 613.451 = 5.797; p-value=0.122
เลอก Linear Model ?
STATA 8
. xi:mfp logit dfree ndrugtx age i.ivhx race treat site i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)
Deviance for model with all terms untransformed = 619.248, 575 observations
Variable Model (vs.) Deviance Dev diff. P Powers (vs.)----------------------------------------------------------------------age lin. FP2 619.248 0.480 0.923 1 -2 3
Final 619.248 1
[_Iivhx_3 included with 1 df in model]
ndrugtx lin. FP2 619.248 5.797 0.122 1 -1 -1Final 619.248 1
[treat included with 1 df in model]
[_Iivhx_2 included with 1 df in model]
[race included with 1 df in model]
[site included with 1 df in model]
ใชคาสง Multivariate Fractional Multinomial (mfp)
Fractional polynomial fitting algorithm converged after 1 cycle.
Transformations of covariates:
-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample) -> gen double Iage__1 = age-32.3826087 if e(sample)
Final multivariable fractional polynomial model for dfree--------------------------------------------------------------------
Variable | -----Initial----- -----Final-----| df Select Alpha Status df Powers
-------------+------------------------------------------------------ndrugtx | 4 1.0000 0.0500 in 1 1
age | 4 1.0000 0.0500 in 1 1_Iivhx_2 | 1 1.0000 0.0500 in 1 1_Iivhx_3 | 1 1.0000 0.0500 in 1 1
race | 1 1.0000 0.0500 in 1 1treat | 1 1.0000 0.0500 in 1 1site | 1 1.0000 0.0500 in 1 1
--------------------------------------------------------------------
Logistic regression Number of obs = 575LR chi2(7) = 34.48Prob > chi2 = 0.0000
Log likelihood = -309.62413 Pseudo R2 = 0.0527
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------Indru__1 | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761Iage__1 | .0503708 .0173224 2.91 0.004 .0164196 .084322
_Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662
race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125
_cons | -1.053693 .2264488 -4.65 0.000 -1.497524 -.6098613------------------------------------------------------------------------------Deviance: 619.248.
6. Check for interactions among the variables in the model.
-เมอมตวแปรประกอบเปน interaction order from ทสงกวา
จะตองมตวแปรนนๆ อยในโมเดลทม order ตากวา เรยกวา
“Hierarchically Well-formated Model: HWL”
-เชน เมอม third order term
logit P(X) = x1
+ x2
+ x3
+ x1*x
2+x
1*x
3+ x
2*x
3+ x
1*x
2*x
3
logit P(X) = x1
+ x2
+ x3
+ x2*x
3+ x
1*x
2*x
3(ไมถกตอง)
Interaction assessment
-วเคราะหโดยใช Wald Statistics หรอ Likelihood ratio test
)ln(ln2
)ln2()ln2(
)ˆ(
ˆ:
fullreduced
fullreduced
i
ij
RLR
LRLRtestLR
seZtestWald
001
00
21122
:;: HH
xxxxy
16
. gen tage= treat* age
. logit dfree treat age tage
Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -322.31165Iteration 2: log likelihood = -322.26464Iteration 3: log likelihood = -322.26464
Logistic regression Number of obs = 575LR chi2(3) = 9.20Prob > chi2 = 0.0268
Log likelihood = -322.26464 Pseudo R2 = 0.0141
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------treat | -1.123388 1.042136 -1.08 0.281 -3.165936 .9191606
age | -.0077915 .0238604 -0.33 0.744 -.0545571 .0389741tage | .0480969 .0314183 1.53 0.126 -.0134819 .1096756
_cons | -1.043996 .7884888 -1.32 0.185 -2.589406 .5014138------------------------------------------------------------------------------
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3
. di "Log likelihood = " e(ll)Log likelihood = -306.72558. estimates store A
. logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 age_ndrugfp2, nolog
Logistic regression Number of obs = 575LR chi2(10) = 48.07Prob > chi2 = 0.0000
Log likelihood = -302.83141 Pseudo R2 = 0.0735------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .1330456 .0699532 1.90 0.057 -.0040603 .2701514ndrugfp1 | 2.113401 1.542559 1.37 0.171 -.9099592 5.136761ndrugfp2 | .6035234 .5850525 1.03 0.302 -.5431584 1.750205
race | .3071997 .2277417 1.35 0.177 -.1391657 .7535652treat | .398666 .201887 1.97 0.048 .0029748 .7943572site | .1678239 .2226292 0.75 0.451 -.2685213 .6041691
_Iivhx_2 | -.5460554 .2947045 -1.85 0.064 -1.123666 .0315548_Iivhx_3 | -.7156675 .2607849 -2.74 0.006 -1.226796 -.2045385
age_ndrugfp1 | -.0285781 .0445662 -0.64 0.521 -.1159261 .05877age_ndrugfp2 | -.0050124 .017133 -0.29 0.770 -.0385924 .0285675
_cons | -7.251346 2.516295 -2.88 0.004 -12.18319 -2.319499------------------------------------------------------------------------------. di "Log likelihood = " e(ll)Log likelihood = -302.83141
. estimates store A1
. lrtest A A1
Likelihood-ratio test LR chi2(2) = 7.79(Assumption: A nested in A1) Prob > chi2 = 0.0204
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3ageivhx2 ageivhx3
. di "Log likelihood = " e(ll)Log likelihood = -306.35593
. estimates store A2
. lrtest A A2
Likelihood-ratio test LR chi2(2) = 0.74(Assumption: A nested in A2) Prob > chi2 = 0.6910...
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 agetreat
. di "Log likelihood = " e(ll)Log likelihood = -305.34312. estimates store A3. lrtest A A3
Likelihood-ratio test LR chi2(1) = 2.76(Assumption: A nested in A3) Prob > chi2 = 0.0964
. qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 racesite
. di "Log likelihood = " e(ll)Log likelihood = -302.45334. estimates store A4. lrtest A A4 Likelihood-ratio test LR chi2(1) = 8.54(Assumption: A nested in A4) Prob > chi2 = 0.0035
0.126024.14-304.65412ndrugtx* x race
0.230522.94-305.25796ndrugtx* x treat
0.99832.003-306.72386ndrugtx* x site
0.245745.43-304.00917ndrugx* xIivhx*
0.206211.60-305.92657age x site
0.096412.76-305.34312age x treat
0.656910.20-306.6269age x race
0.691020.74-306.35593age x _Iivhx*
0.020427.79-302.83141age x ndrugx*
-309.62413 โมเดล main effect
P valuedfGLog likelihoodinteraction
ndrugtx* ndrugfp1 = ((ndrugtx+1)/10)^(-1); ndrugfp2 = ndrugfp1*log((ndrugtx+1)/10)
0.648520.87-306.29100Site x _ivhx*
0.410821.78-305.83605race x _ivhx*
0.854310.03-306.70871treat x site
0.979820.04-306.70513treat x _Iivhx*
0.003518.54-302.45334race x site
0.331510.94-306.70871race x treat
P valuedfGLog likelihoodinteraction
การพจารณาตวแปร interaction เขาในโมเดล พจารณา p-value
ทระดบนยสาคญท 0.10 ประกอบดวยตวแปร age x ndrugx*,
age x treat, race x site
. logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 age_ndrugfp2 agetreat racesite
…Logistic regression Number of obs = 575
LR chi2(12) = 58.56Prob > chi2 = 0.0000
Log likelihood = -297.58223 Pseudo R2 = 0.0896------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .1226443 .0745964 1.64 0.100 -.0235621 .2688506ndrugfp1 | 2.48771 1.596674 1.56 0.119 -.6417143 5.617134ndrugfp2 | .7448356 .6043055 1.23 0.218 -.4395815 1.929253
race | .6996389 .2667644 2.62 0.009 .1767903 1.222488treat | -1.274051 1.07897 -1.18 0.238 -3.388793 .840692site | .4977606 .2563434 1.94 0.052 -.0046632 1.000184
_Iivhx_2 | -.6243653 .2996152 -2.08 0.037 -1.2116 -.0371303_Iivhx_3 | -.6905352 .2627414 -2.63 0.009 -1.205499 -.1755716
age_ndrugfp1 | -.0387742 .046132 -0.84 0.401 -.1291912 .0516428age_ndrugfp2 | -.0090046 .017702 -0.51 0.611 -.0436998 .0256906
agetreat | .0520701 .0324382 1.61 0.108 -.0115076 .1156477racesite | -1.416875 .5318186 -2.66 0.008 -2.459221 -.3745301
_cons | -7.141237 2.666019 -2.68 0.007 -12.36654 -1.915936------------------------------------------------------------------------------
ตวแปร age x ndrugfp2 มคาสถต ward เทากบ -0.51 และ p value = 0.611
มากทสดใหนาตวแปรนออกจากโมเดล วเคราะหโมเดลใหม
17
. logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 agetreat racesite
...
Logistic regression Number of obs = 575LR chi2(11) = 58.31Prob > chi2 = 0.0000
Log likelihood = -297.71139 Pseudo R2 = 0.0892------------------------------------------------------------------------------
dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0889238 .0339956 2.62 0.009 .0222937 .1555539ndrugfp1 | 1.705601 .4106322 4.15 0.000 .9007769 2.510426ndrugfp2 | .4440587 .1175928 3.78 0.000 .213581 .6745364
race | .6869266 .265402 2.59 0.010 .1667483 1.207105treat | -1.252787 1.080874 -1.16 0.246 -3.371262 .8656875site | .4903829 .2560081 1.92 0.055 -.0113838 .9921497
_Iivhx_2 | -.6299072 .2994363 -2.10 0.035 -1.216792 -.0430227_Iivhx_3 | -.694879 .262544 -2.65 0.008 -1.209456 -.1803021
age_ndrugfp1 | -.0155328 .0060924 -2.55 0.011 -.0274737 -.0035918agetreat | .0515973 .0325362 1.59 0.113 -.0121726 .1153672racesite | -1.401606 .5309161 -2.64 0.008 -2.442183 -.3610301
_cons | -5.976921 1.338859 -4.46 0.000 -8.601036 -3.352807------------------------------------------------------------------------------
ตวแปร agetreat มคาสถต wald เทากบ 1.59 และ p value = 0.113 มากทสด
ในโมเดลน ใหนาตวแปรนออกจากโมเดล วเคราะหโมเดลใหม
. logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesite
…Logistic regression Number of obs = 575
LR chi2(10) = 55.77Prob > chi2 = 0.0000
Log likelihood = -298.98146 Pseudo R2 = 0.0853
------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .1166385 .0288749 4.04 0.000 .0600446 .1732323
ndrugfp1 | 1.669035 .407152 4.10 0.000 .871032 2.467038ndrugfp2 | .4336886 .1169052 3.71 0.000 .2045586 .6628185
race | .6841068 .2641355 2.59 0.010 .1664107 1.201803treat | .4349255 .2037596 2.13 0.033 .035564 .834287site | .516201 .2548881 2.03 0.043 .0166295 1.015773
_Iivhx_2 | -.6346307 .2987192 -2.12 0.034 -1.220109 -.0491518_Iivhx_3 | -.7049475 .2615805 -2.69 0.007 -1.217636 -.1922591
age_ndrugfp1 | -.0152697 .0060268 -2.53 0.011 -.0270819 -.0034575racesite | -1.429457 .5297806 -2.70 0.007 -2.467808 -.3911062
_cons | -6.843864 1.219316 -5.61 0.000 -9.23368 -4.454048------------------------------------------------------------------------------
ตวแปร age x _ndrugfp1 มคาสถต wald เทากบ -2.53 และ p value = 0.011 มากทสด
ของตวแปร interaction แตนอยกวา 0.10 คงไวในโมเดล
ขอพงระวงในการวเคราะห logistic regression
-ความสมพนธระหวางตวแปรอสระสง (collinearity, multi-
collinearity) ทาให coefficient เปลยนแปลง
การแกปญหา Ridge logistic regression, พจารณาตดตวแปร,
สรางตวแปรใหม
-multiple testing
-influential observation (outliers)
-Problem of perfect or complete separation
ภาระรวมเสนตรง* (Collinearity)
ความสมพนธระหวางตวแปรอสระดวยกน มคาสง
(r2 > 0.90; r > 0.95 Kleinbaum, Muller, Nizam; 1998, 241)
การลดหรอเพมตวแปรในโมเดล ทาใหเปลยนแปลงคาสมประสทธ
ทงขนาดและ/หรอเครองหมาย
คา R2 มคาสงแตการทดสอบทางสถตกบสมประสทธ พบวา
ไมมนยสาคญ
ทาใหคา Standard error สง ซงสงผลใหคาสถตมคาตาเชน t, z
และทาใหคาชวงเชอมนของสมประสทธมคากวาง
*พจนานกรมศพทคณตศาสตร ฉบบราชบณฑตยสถาน, 2552
. twocat 98 1 1 98
. logit y x1 x2 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x1 | .2995896 1.429618 0.21 0.834 -2.502411 3.10159x2 | -.0143819 1.429593 -0.01 0.992 -2.816334 2.78757x3 | .3139715 .2886275 1.09 0.277 -.2517281 .8796711
_cons | -.3670144 .2425088 -1.51 0.130 -.8423228 .1082941------------------------------------------------------------------------------
. logit y x1 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x1 | .2854983 .2861025 1.00 0.318 -.2752523 .8462489x3 | .3136786 .2871556 1.09 0.275 -.249136 .8764931
_cons | -.3670266 .2425058 -1.51 0.130 -.8423293 .1082761------------------------------------------------------------------------------
. logit y x2 x3------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
x2 | .279187 .2860666 0.98 0.329 -.2814933 .8398672x3 | .3079032 .2871195 1.07 0.284 -.2548407 .8706471
_cons | -.3612278 .2408548 -1.50 0.134 -.8332944 .1108389------------------------------------------------------------------------------
. corr x1 x2 x3(obs=198)
| x1 x2 x3-------------+---------------------------
x1 | 1.0000x2 | 0.9798 1.0000x3 | 0.0000 0.0203 1.0000
. collin x1 x2 x3Collinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------x1 25.25 5.03 0.0396 0.9604x2 25.26 5.03 0.0396 0.9604x3 1.01 1.01 0.9897 0.0103
----------------------------------------------------Mean VIF 17.17
CondEigenval Index
---------------------------------1 3.0422 1.00002 0.6908 2.09853 0.2570 3.44084 0.0100 17.4460
---------------------------------Condition Number 17.4460Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0396
18
การตรวจสอบ collinearity หรอ multicollinearity
Pearson Correlation (informal method)
-ตรวจสอบความสมพนธทกตวแปร โดยใชสถต Pearson correlation
พจารณาตวแปรทมความสมพนธกบตวแปรอนๆ สง. corr age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesite
(obs=575)| age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_nd~1 racesite
-------------+------------------------------------------------------------------------------------------age | 1.0000
ndrugfp1 | -0.1836 1.0000ndrugfp2 | 0.1601 -0.9916 1.0000
race | 0.0139 0.0874 -0.0821 1.0000treat | -0.0446 0.0251 -0.0204 0.0791 1.0000site | -0.0287 0.1923 -0.1926 -0.0795 -0.0230 1.0000
_Iivhx_2 | 0.1063 -0.0551 0.0567 -0.0152 0.0513 0.1623 1.0000_Iivhx_3 | 0.2674 -0.3045 0.2843 -0.1806 -0.0695 -0.2292 -0.4138 1.0000
age_ndrugfp1 | 0.0462 0.9546 -0.9475 0.1080 0.0108 0.1833 -0.0134 -0.2506 1.0000racesite | 0.0430 0.1831 -0.1834 0.4384 0.0522 0.3849 -0.0303 -0.1295 0.2055 1.0000
Variance Inflation Factors (VIF: formal method)
พจารณาคา VIF > 10 และ
คาเฉลยของ VIF มากกวา 1 มปญหาการเกด multicolinearity. collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------age 2.64 1.63 0.3782 0.6218
ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843
race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910
_Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939
age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891
----------------------------------------------------Mean VIF 20.82
- Generalized Variance inflaction factor (GVIF)
VIF คานวณอยางไร?
r r2 vif.1 0.01 1.01 .2 0.04 1.04 .3 0.09 1.10 .4 0.16 1.19 .5 0.25 1.33 .6 0.36 1.56 .7 0.49 1.96 .8 0.64 2.78 .9 0.81 5.26
.91 0.83 5.82
.92 0.85 6.51
.93 0.86 7.40
.94 0.88 8.59
.95 0.90 10.26
.96 0.92 12.76
.97 0.94 16.92
.98 0.96 25.25
.99 0.98 50.25 1 1.00 .
ความสมพนธระหวาง VIF vs คา correlation
.95
วธ Variance inflation factors
- เพอวดวาความแปรปรวนทประมาณจากคาสมประสทธ
inflated ไปเพยงใดเมอเปรยบเทยบกบการมตวแปรอสระ
ทไมมความสมพนธเชงเสน
1-p
1-p
1i
KVIF
VIF
และ
2
iR1
11)
2
iR(1
iVIF
)2i
R(1i
tolerance
Indication of Multicollinearity ดวยวธ Variance inflation factors*
- VIF > 10 indication that Multicollinearity
- Mean VIF provides information about the severity of the
multicollinearity
- if Mean VIF > 1 are indicative of serious multicollinearity
problems
*Neter, Wasserman, Kutner (1987; p.392)
Marquardt (1970); Belsley, Kuh & Welsch (1980)
- tolerence <0.20 or 0.10 and/or VIF>5 or 10+ (O’Brien, 2007)
Stata
collin [varlist…]estat vif variance inflation factors for the
independent variables
Conditional Index & Variance Decomposition Proportion
คา Conditional Index (CI) และคา Variance Decomposition
Proportion (VDP) เปนคาทคานวณจาก eigenvalue จากการ
วเคราะหเมตรกซสหสมพนธ ของตวแปรอสระ โดย Conditional
Index คานวณจาก
คา Conditional Index มคา 10-30 แสดงวามภาวะรวมเสนตรง
คา conditional index > 30 แสดงวามปญหาภาวะรวมเสนตรง
Conditional Index > 100 แสดงวามภาวะรวมเสนตรงสงมากๆ
(Belsley, 1991a)
between 10 and 30, there is moderate to strong multicollinearity and
if it exceeds 30 there is severe multicollinearity. (Gujarati, 2002)
Eigenvaluek MinMax ;/
19
Conditional Index & Variance Decomposition Proportion
คา Variance Decomposition Proportion แนะนาโดย
Belsley et al. (1980) และ Belsley (1991a)
พจารณา VDP มากกวา 0.5
คานวณคาสดสวนของความแปรปรวน (proposed calculation of
the proportions of variance) ของแตละตวแปรสมพนธกบ
คาองคประกอบ (principal component) เปรยบเสมอน
องคประกอบของคาสมประสทธความแปรปรวนในแตละมต
(decomposition of the coefficient variance for each dimension)
kj
jkjk VIF
Vp
2
(Fox,1984)
. collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics
SQRT R-Variable VIF VIF Tolerance Squared
----------------------------------------------------age 2.64 1.63 0.3782 0.6218
ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843
race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910
_Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939
age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891
----------------------------------------------------Mean VIF 20.82
CondEigenval Index
---------------------------------1 5.9439 1.00002 1.2749 2.15923 1.0679 2.35924 1.0129 2.42245 0.7402 2.83386 0.4588 3.59957 0.3110 4.37168 0.1469 6.36209 0.0320 13.628910 0.0094 25.149011 0.0021 52.8408
---------------------------------Condition Number 52.8408Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0002
- ตรวจสอบคา conditional index/variance decomposition proportion
- CI มากกวา 30, VDP มากกวาหรอเทากบ .5
. coldiag2 age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1, force w(5)
Condition number using scaled variables = 52.18
Condition Indexes and Variance-Decomposition Proportions
conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1
> 1 1.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.002 2.22 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.12 0.003 2.38 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.37 0.03 0.004 2.70 0.00 0.00 0.00 0.00 0.64 0.01 0.14 0.00 0.03 0.005 3.23 0.00 0.00 0.00 0.00 0.20 0.05 0.69 0.14 0.00 0.006 3.56 0.00 0.00 0.00 0.00 0.03 0.80 0.03 0.12 0.06 0.007 6.12 0.02 0.02 0.00 0.00 0.12 0.08 0.06 0.32 0.70 0.008 13.34 0.07 0.06 0.01 0.02 0.00 0.02 0.00 0.03 0.02 0.219 24.83 0.10 0.43 0.03 0.35 0.00 0.00 0.00 0.00 0.03 0.3010 52.18 0.81 0.49 0.96 0.62 0.00 0.00 0.00 0.00 0.01 0.49
. prnt_cx, force w(5)
Condition Indexes and Variance-Decomposition Proportions
conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1
> 1 1.00 . . . . . . . . . . 2 2.22 . . . . . . . . . . 3 2.38 . . . . . . . 0.37 . . 4 2.70 . . . . 0.64 . . . . . 5 3.23 . . . . . . 0.69 . . . 6 3.56 . . . . . 0.80 . . . . 7 6.12 . . . . . . . 0.32 0.70 . 8 13.34 . . . . . . . . . . 9 24.83 . 0.43 . 0.35 . . . . . 0.3010 52.18 0.81 0.49 0.96 0.62 . . . . . 0.49
Variance-Decomposition Proportions less than .3 have been printed as "."
วธการคานวณทาไดโดย 1) การสรางโมเดลทประกอบดวยตวแปร interaction แยกตามกลมตวแปรเสยง 2) คานวณความแตกตางระหวาสองโมเดล 3) ให exponential คาทไดในขอ 2 ดงน
ถาให f เปนตวแปรเสยง และ x เปนตวแปร covariate เขยนสมการแบบ logit ดงน
xfxfxfg ii 3210),(
การคานวณ odds ratio กรณม interaction 1. สรางโมเดล logit 2 โมเดลดงน
2. เปรยบเทยบความแตกตางของสองโมเดลในรปของ log
3.ให exponential คาทไดในขอ 2 ดงน
xfxfxfg 1321101 ),(
xfxfxfg 0320100 ),(
)()(
)()(
),(),(),,(log
013011
032010132110
0101
ffxff
xfxfxfxf
xfgxfgxfFfFOR
)()(exp 013011 ffxffor
20
การคานวณ odds ratio กรณม interaction
เมอตวแปร independent = 2 ตวแปร
)()(exp 013011 ffxffor
fxxfp
pyxfg 3211
ln),(
F=f =1= risk factor f=0 reference , x = covariate
Confidence interval
xXFForESZx ,0,1(lnˆˆˆ2/31
)ˆ,ˆ(ˆ2)ˆ(ˆ)ˆ(ˆ
,0,1(ˆlnˆ
2132
1 voxCraVxraV
xXFFROraV
. logit low lwd age lwd_ageIteration 0: log likelihood = -117.336Iteration 1: log likelihood = -110.71804Iteration 2: log likelihood = -110.57024Iteration 3: log likelihood = -110.56997Logistic regression Number of obs = 189
LR chi2(3) = 13.53Prob > chi2 = 0.0036
Log likelihood = -110.56997 Pseudo R2 = 0.0577------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
lwd | -1.944089 1.724804 -1.13 0.260 -5.324643 1.436465age | -.0795722 .0396343 -2.01 0.045 -.157254 -.0018904
lwd_age | .1321967 .0756982 1.75 0.081 -.0161691 .2805626_cons | .7744952 .9100949 0.85 0.395 -1.009258 2.558248
------------------------------------------------------------------------------
. di exp((-1.944)+(.132*15)) /* อายเพม 15 ป x lwd=1 */
1.0366558
. di exp((-1.944)+(.132*20)) /* อายเพม 20 ป x lwd=1 */
2.0057138
ตวอยาง ตวแปร lwd (1=lwt < 110 pounds, 0=otherwise)
age = continuous (years)
. estat vceCovariance matrix of coefficients of logit model
e(V) | lwd age lwd_age _cons -------------+------------------------------------------------
lwd | 2.974949 age | .03526621 .00157088
lwd_age | -.12760349 -.00157088 .00573022 _cons | -.82827277 -.03526621 .03526621 .82827277
. lincom lwd + 15*lwd_age , or( 1) lwd + 15 lwd_age = 0------------------------------------------------------------------------------
low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 1.039627 .6865828 0.06 0.953 .284927 3.79334------------------------------------------------------------------------------
. lincom lwd + 20*lwd_age , or
( 1) lwd + 20 lwd_age = 0------------------------------------------------------------------------------
low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
(1) | 2.013443 .81264 1.73 0.083 .9128263 4.441098------------------------------------------------------------------------------