logistic multiple 2557 up - kku web hosting · logistic function 1 e 1 f( ) 0 1 e 1 1 1 e 1 fitting...

Click here to load reader

Post on 27-Jun-2019

214 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 1

    Multiple Logistic Regression

    .

    0

    1

    1/2)(

    e1

    1)f(-

    Logistic function

    )(e1

    1)f(

    0

    e1

    1

    1e1

    1

    Fitting Multiple Logistic Regression 2

    (Dependent, Outcome, Response) = discrete

    (two possible) (independent, predictor, explanatory)

    = continuous, categorical (--> dummy)

    Outcome V predictorpredictor

    predictor ...

    Multiple Logistic Regression

    low birth weightLBW0 >=25001

  • 2

    Multiple Logistic Regression Low BirthWeight age, lwt, race, ftv

    ftvIraceIracelwtageppy 5

    3__42__3

    2

    1

    1

    ln 0

    . xi: logit low age lwt i.race ftv, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

    Logistic regression Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335

    Log likelihood = -111.28645 Pseudo R2 = 0.0516

    ------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251

    _Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085

    ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347

    ------------------------------------------------------------------------------

    . list id low age lwt _Irace_2 _Irace_3 ftv phat

    +--------------------------------------------------------------+| id low age lwt _Irace_2 _Irace_3 ftv phat ||--------------------------------------------------------------|

    1. | 4 1 28 120 0 1 0 .3434579 |2. | 10 1 29 130 0 0 2 .2065388 |3. | 11 1 34 187 1 0 0 .2360498 |4. | 13 1 25 105 0 1 0 .4102857 |5. | 15 1 25 85 0 1 0 .4805368 |

    |--------------------------------------------------------------|...

    186. | 223 0 35 170 0 0 1 .1182268 |187. | 224 0 19 120 0 0 0 .2959572 |188. | 225 0 24 116 0 0 1 .2732751 |189. | 226 0 45 123 0 0 1 .1710699 |

    +--------------------------------------------------------------+

    ftvIraceIracelwtagee

    ftvIraceIracelwtageep543210

    543210

    3__2__1

    3__2__

    Fit Model Logistic Regression- coefficient Maximum Likelihood /

    Generalized Linear Model:- Random component or Family: binomial- Link Function : logit

    - Systematic component : x1, x2, xp

    pp

    g

    1ln

    1ln)(

    pp xxxppxy

    ...1

    ln)( 2210

    Model- likelihood ratio test (G ) constant fitted Model

    - Wald Test

    variablethewithlikelihoodvariablethewithoutlikelihood2lnG

    )(

    se

    Z jj

    Model- likelihood ratio test (G )

    constant fitted Model . xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

    Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645

    Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335

    Log likelihood = -111.28645 Pseudo R2 = 0.0516

    ------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251

    _Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085

    ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347

    ------------------------------------------------------------------------------

    G = -2[(-117.336)-(-111.286))] =12.099

    Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645

    Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335

    Log likelihood = -111.28645 Pseudo R2 = 0.0516

    1 0

  • 3

    Model- likelihood ratio test (G ) Model 1 Model 2

    )()()()(1

    ln 43210 othersB raceracelwtagepp

    . use "H:\Hosmer_logistic\alr_data_Hosmer\logistic\lwt_2556.dta", clear

    . xi: logit low age lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336Iteration 3: log likelihood = -111.33032Logistic regression Number of obs = 189

    LR chi2(4) = 12.01Prob > chi2 = 0.0173

    Log likelihood = -111.33032 Pseudo R2 = 0.0512------------------------------------------------------------------------------

    low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | -.0255238 .033252 -0.77 0.443 -.0906966 .039649lwt | -.0143532 .0065228 -2.20 0.028 -.0271377 -.0015688

    _Irace_2 | 1.003822 .4980135 2.02 0.044 .0277335 1.97991_Irace_3 | .4434608 .3602569 1.23 0.218 -.2626298 1.149551

    _cons | 1.306741 1.069782 1.22 0.222 -.7899926 3.403475------------------------------------------------------------------------------. est store m1

    )()()()()(1

    ln 543210 ftvraceracelwtagepp

    othersB

    . xi: logit low age lwt i.race ftvi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645Logistic regression Number of obs = 189

    LR chi2(5) = 12.10Prob > chi2 = 0.0335

    Log likelihood = -111.28645 Pseudo R2 = 0.0516------------------------------------------------------------------------------

    low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251

    _Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085

    ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347

    ------------------------------------------------------------------------------. est store m2. lrtest m1 m2Likelihood-ratio test LR chi2(1) = 0.09(Assumption: m1 nested in m2) Prob > chi2 = 0.7671. di -2*((-111.33032)-(-111.28645)).08774. di chiprob(1,.08774).76707018

    G = -2ln(likelihood without the variable-likelihood with the variable)

    Wald Test

    )(

    seZ jj

    ------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------age | -.023823 .0337295 -0.71 0.480 -.0899317 .0422857lwt | -.0142446 .0065407 -2.18 0.029 -.0270641 -.0014251

    _Irace_2 | 1.003898 .4978579 2.02 0.044 .0281143 1.979681_Irace_3 | .4331084 .3622397 1.20 0.232 -.2768684 1.143085

    ftv | -.0493083 .1672386 -0.29 0.768 -.3770899 .2784733_cons | 1.295366 1.071439 1.21 0.227 -.8046157 3.395347

    ------------------------------------------------------------------------------

    Confidence Interval Estimation-Estimate confidence of coefficient

    )()1(100 2/ seZof%CI i

    xi: logit low lwt i.racei.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

    Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097

    Log likelihood = -111.62955 Pseudo R2 = 0.0486

    ------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023

    _Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967

    _cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------

    p

    ijiji

    p

    i

    p

    iii

    p

    iii voCxxraVxxraV

    0 11

    2

    0),(2)()(

    Individual Predicted probability & Confidence IntervalEstimation-Estimate Variance of logit

    )()()(5)1(1000

    2/

    p

    iiii xseZxpxpof%CI

    lwt=150 race=White

    )],()][()][([2

    )],()][()[(2

    )],()][()[(2)],()][([2

    )],()][([2)],()[(2

    )](][)([)](][)([

    )]()[()()],150([

    32

    31

    2130

    2010

    32

    22

    12

    0

    Covblackraceblackrace

    Covotherracelwt

    CovblackracelwtCovotherrace

    CovblackraceCovlwt

    VarotherraceVarblackrace

    VarlwtraVwhiteracelwtyraV

    p

    ijiji

    p

    i

    p

    iii

    p

    iii voCxxraVxxraV

    0 11

    2

    0

    ),(2)()(

    . di .71429959 + (150^2)*(.00004146) + (0^2)*(.23819397) + (0^2)*(.12721584) + 2*150*(-.00521365) + 2*0*(.02260223) + 2*0*( -.1034968) + 2*0*(-.00064703) + 2*0*(.00035585) + 2*0*0*(.05320001)

    .08305459

  • 4

    xi: logit low lwt i.race, nologi.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

    Logistic regression Number of obs = 189LR chi2(3) = 11.41Prob > chi2 = 0.0097

    Log likelihood = -111.62955 Pseudo R2 = 0.0486

    ------------------------------------------------------------------------------low | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------lwt | -.0152231 .0064393 -2.36 0.018 -.0278439 -.0026023

    _Irace_2 | 1.081066 .4880512 2.22 0.027 .1245034 2.037629_Irace_3 | .4806033 .3566733 1.35 0.178 -.2184636 1.17967

    _cons | .8057535 .8451625 0.95 0.340 -.8507345 2.462241------------------------------------------------------------------------------

    . vce

    Covariance matrix of coefficients of logit model

    e(V) | lwt _Irace_2 _Irace_3 _cons -------------+------------------------------------------------

    lwt | .00004146_Irace_2 | -.00064703 .23819397_Irace_3 | .00035585 .05320001 .12721584

    _cons | -.00521365 .02260223 -.1034968 .71429959

    . di (-.0152231*150)+(1.081066*0)+(.4806033*0) + .8057535-1.4777115

    . di exp(-1.4777115)/(1+exp(-1.4777115))

    .18577333

    . prvalue, x(lwt=150 _Irace_2=0 _Irace_3=0)logit: Predictions for lowConfidence intervals by delta method

    95% Conf. IntervalPr(y=1|x): 0.1858 [ 0.1003, 0.2713]Pr(y=0|x): 0.8142 [ 0.7287, 0.8997]

    lwt _Irace_2 _Irace_3x= 150 0 0

    pp

    pp

    XXXe

    XXXep

    ...

    1

    ...

    2211

    2211

    ftvIraceIracelwtagee

    ftvIraceIracelwtageep54321

    54321

    3__2__1

    3__2__

    lwt=150,

    Confidence Interval Estimation-Estimate confidence of p

    )pse(ZpittrueCI /i log)%1(100 2

    )( 2/

    )(2/

    1)%1(100 pseZp

    e

    eepofCI

    pseZp

    . do "I:\cat2011\95ci_p_logit.do"

    . di (exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115-((abs(invnormal(0.025)))*sqrt(.08305459)))))

    .11480659

    . di (exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459))))/(1+(exp(-1.4777115+((abs(invnormal(0.025)))*sqrt(.08305459)))))

    .28641379

    Interpretation of the fitted model: odds ratio- Dichotomous - 2 2

    Two independent variablesx1 code 0,1 ,and Fixed Value of x2; or Adjusted x2

    22

    22

    22

    221

    221

    221

    221

    221

    221

    221

    11],0|0Pr[1],0|0[

    ,11

    ],0|1[

    ,1

    1],1|1Pr[1],1|0[

    ,11

    ],1|1[

    2121

    )0(

    )0(

    21

    )1(2121

    )1(

    )1(

    21

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    exxyxxyP

    ee

    eexxyP

    exxyxxyP

    ee

    eexxyP

    1221221

    22221

    22

    221

    ee

    eeee

    bcador

    xx

    xxx

    x

    dxxyxxyPcxxyP

    bxxyxxyPaxxyP

    ],0|0Pr[1],0|0[,],0|1[

    ,],1|1Pr[1],1|0[,],1|1[

    2121

    21

    2121

    21

    221

    221

    1 xx

    ee

    221 )1(11

    xe

    22

    22

    1 xx

    ee

    2211

    xe

    a bc d

    a

    b

    c

    d

    multiple logistic regressionsmoke, age odds ratio smoke Adjusted age

    age

    age

    age

    age

    age

    age

    age

    age

    age

    age

    e

    agesmokelowagesmokelowPee

    eeagesmokelowP

    exxyagesmokelowP

    ee

    eeagesmokelowP

    2

    2

    2

    21

    21

    21

    21

    21

    21

    21

    11

    ],0|0Pr[1],0|0[11

    ],0|1[

    11],1|1Pr[1],1|0[

    11],1|1[

    )0(

    )0(

    )1(21

    )1(

    )1(

  • 5

    12121

    221

    2

    21

    ee

    eeee

    bcador

    ageage

    ageageage

    age

    dagesmokelowagesmokelowPcagesmokelowP

    bagesmokelowagesmokelowPaagesmokelowP

    ],0|0Pr[1],0|0[,],0|1[

    ,],1|1Pr[1],1|0[,],1|1[

    age

    age

    ee

    21

    21

    1

    agee 21 )1(11

    age

    age

    ee

    2

    2

    1

    agee 211

    a bc d

    odds ratio logistic regression- Adjusted odds ratio

    smoke=1 - age - age

    ORadjustedieiOR

    odds ratio logistic regression-- Adjusted odds ratio

    D Exposure (E)Control (C)

    Control (C) Control... odds ratio logistic regression- Ci E D OR E

    . logit low smoke age, or

    Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -113.66733Iteration 2: log likelihood = -113.63815Iteration 3: log likelihood = -113.63815

    Logistic regression Number of obs = 189LR chi2(2) = 7.40Prob > chi2 = 0.0248

    Log likelihood = -113.63815 Pseudo R2 = 0.0315

    ------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------smoke | 1.997405 .642777 2.15 0.032 1.063027 3.753081

    age | .9514394 .0304194 -1.56 0.119 .8936481 1.012968------------------------------------------------------------------------------

    odds ratio logistic regression- 1.997

    polychotomous- > 2 - (dummy variables) k-1

    k k-1 (k=, )

    level/ (dummy variable)group code D1 D2code=1 0 0 code=2 1 0code=3 0 1

    Reference Cell

    code=2 VS code=1, code=3 VS code=1

    three independent variables- x1 code 0,1 ,2 - and Fixed Value of x2 ,x3; or Adjusted x2,x3

    ,1

    1],,1|1[1],,1|0[

    ,1

    1],,1|1[

    33221

    33221

    33221

    33221

    33221

    )1(

    321321

    )1(

    )1(

    321

    xx

    xx

    xx

    xx

    xx

    e

    xxxyPxxxyPeeeexxxyP

    = a

    = b

  • 6

    3322

    3322

    3322

    33221

    33221

    11

    ],,0|0[1],,0|0[

    ,1

    1],,0|1[

    321321

    )0(

    )0(

    321

    xx

    xx

    xx

    xx

    xx

    e

    xxxyPxxxyPeeeexxxyP

    = c

    = d

    33221

    33221

    1 xxxx

    ee

    33221 )1(11

    xxe

    3322

    3322

    1 xxxx

    ee

    332211

    xxe

    a bc d

    13322133221

    332233221

    22

    33221

    ee

    eee

    ebcador

    xxxx

    xxxxx

    xx

    ,1

    1

    ],3__,,,1_|1[

    54321

    54321

    54321

    54321

    3__

    3__

    3__)1(

    3__)1(

    ftvIracelwtage

    ftvIracelwtage

    ftvIracelwtage

    ftvIracelwtage

    ee

    ee

    ftvIracelwtageIraceyP

    = a

    multiple logistic regressionage, lwt, i.rece (_Irace_2) , ftv ; odds ratio _Irace_2 Adjusted age, lwt, i.rece (_Irace_3) , ftv ,

    11

    ],3__,,,12__|1[1],3__,,,12__|0[

    54321 3__ ftvIracelwtagee

    ftvIracelwtageIraceyPftvIracelwtagelraceyP

    ,1

    1

    ],3__,,,0_|1[

    5432

    5432

    54321

    54321

    3__

    3__

    3__)0(

    3__)0(

    ftvIracelwtage

    ftvIracelwtage

    ftvIracelwtage

    ftvIracelwtage

    ee

    ee

    ftvIracelwtageIraceyP

    = b

    = c

    ,1

    1],3__,,,12__|1[1

    ],3__,,,02__|0[

    5432 3__ ftvIracelwtagee

    ftvIracelwtageIraceyPftvIracelwtagelraceyP

    = d

    ftvIracelwtage

    ftvIracelwtage

    ee

    54321

    54321

    3__

    3__

    1

    ftvIracelwtagee 54321 3__11

    ftvIracelwtage

    ftvIracelwtage

    ee

    5432

    5432

    3__

    3__

    1

    ftvIracelwtagee 5432 3__11

    a bc d

    1

    5432

    54321

    3__

    3__

    eee

    bcador ftvIracelwtage

    ftvIracelwtage

  • 7

    odds ratio logistic regression- Adjusted odds ratio

    _Irace_2 () - AGE, lwt, _Irace_3 () age, lwt,

    _Irace_3 (), ftv - age, lwt, _Irace_3 (), ftv

    ORadjustedieiOR

    . xi: logit low age lwt i.race ftv,ori.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

    Iteration 0: log likelihood = -117.336Iteration 1: log likelihood = -111.41656Iteration 2: log likelihood = -111.28677Iteration 3: log likelihood = -111.28645

    Logit estimates Number of obs = 189LR chi2(5) = 12.10Prob > chi2 = 0.0335

    Log likelihood = -111.28645 Pseudo R2 = 0.0516

    ------------------------------------------------------------------------------low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------age | .9764586 .0329355 -0.71 0.480 .9139936 1.043193lwt | .9858564 .0064482 -2.18 0.029 .9732989 .9985759

    _Irace_2 | 2.728898 1.358603 2.02 0.044 1.028513 7.240436_Irace_3 | 1.542043 .5585894 1.20 0.232 .7581543 3.13643

    ftv | .9518876 .1591923 -0.29 0.768 .6858544 1.321111------------------------------------------------------------------------------

    odds ratio: - 1 is not clinically interesting 1 1 mm.Hg

    - 5 , 10,- x 0-1 1 0.01

    - odds ratio

    1

    )(c

    ecOR )]([1

    [)(%95 2/

    secZcecORofCI

    odds ratio: Change in Odds or Percent . listcoeflogit (N=189): Factor Change in Odds Odds of: 1 vs 0

    ----------------------------------------------------------------------low | b z P>|z| e^b e^bStdX SDofX

    -------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 0.9765 0.8814 5.2987lwt | -0.01424 -2.178 0.029 0.9859 0.6469 30.5794

    _Irace_2 | 1.00390 2.016 0.044 2.7289 1.4144 0.3454_Irace_3 | 0.43311 1.196 0.232 1.5420 1.2309 0.4796

    ftv | -0.04931 -0.295 0.768 0.9519 0.9491 1.0593----------------------------------------------------------------------

    . listcoef, percentlogit (N=189): Percentage Change in Odds Odds of: 1 vs 0

    ----------------------------------------------------------------------low | b z P>|z| % %StdX SDofX

    -------------+--------------------------------------------------------age | -0.02382 -0.706 0.480 -2.4 -11.9 5.2987lwt | -0.01424 -2.178 0.029 -1.4 -35.3 30.5794

    _Irace_2 | 1.00390 2.016 0.044 172.9 41.4 0.3454_Irace_3 | 0.43311 1.196 0.232 54.2 23.1 0.4796

    ftv | -0.04931 -0.295 0.768 -4.8 -5.1 1.0593----------------------------------------------------------------------

    Modeling Strategy: Two goals of mathematical modeling(1) To obtain a valid estimate of an explanatory variables

    and response variable relationship(2) To obtain a good predictive model Different strategies for difference goals-Prediction goal -> use computer algorithms forward selection,

    backward elimination, stepwise, all possible -Validity goals -> for etiologic research, standard computeralgorithms do not appropriate because the roles thatvariables - such as confounder & effect modifiers

    Modeling Building Strategies GuidelinesVariable Selection Most parsimonious model-minimizing the number of variables in the model -Model is more likely to be numerically stable-More easily generalized

    logistic regression model* purposeful selection

    (Hosmer & Lameshow,2000, 2013)1. A careful univariable analysis of each independent variable2. Fit the multivariable model containing all covariates identified for

    inclusion in step 13. fit of the smaller model (reduced model) compare the values of the

    estimated coefficients in the smaller model to their respective valuesfrom the larger model. concerned about any variable whosecoefficient has changed markedly in magnitude, > 20%,

    -Any variable whose coefficient has changed markedly in magnitude should be added back into the model as it is important in the sense of providing a needed adjustment of the effect of the variables that remain in the model.

    %

  • 8

    logistic regression model-Cycle through steps 2 and 3 until it appears that all of the important variables are included in the model and those excluded are clinically and/or statistically unimportant.-Hosmer et al. use the "delta-beta-hat-percent" as a measure of the change in magnitude of the coefficients. suggest a significant change >20%

    = the coefficient from the smaller model and= the coefficient from the larger model.

    - p value > 0.25 - p value > 0.25

    full

    reducei

    100

    )(% x

    Full

    Fullreduce

    logistic regression model4. Add each variable not selected in Step 1 to the model obtained at theconclusion of cycling through Step 2 and Step 3, one at a time, and check its significance either by the Wald statistic p-value or the partial likelihood ratio test,

    if it is a categorical variable with more than two levels. This step is vital for identifying variables that, by themselves, are not significantly related to the outcome but make an important contribution in the presence of other variables. We refer to the model at the end of Step 4 as the preliminary main effects model.

    logistic regression model5. examine more closely the variables in the model. The question of the appropriate categories for categorical variables should have been addressed during the univariable analysis in Step 1. For each continuous variable in this model we must check the assumption that the logit increases/decreases linearly as a function of the covariate. the model at the end of Step 5 as the main effects model. 6. Have the main effects model, Check for interactions among the

    variables in the model.7. Assess its adequacy and check its fit, Before any model

    becomes the final model

    University of Massachusetts Aids Research Unit (UMARU) Impact Study (UIS)

    id Id number age Age at Enrollment beck Beck Depression Score ivhx IV Drug Use History (1=never

    2=previous 3=recent) ndrugtx Number of Prior Drug Txrace Subjects Race

    (0=white 1=other) treat Tx Randomization

    (0=short 1=long) site Tx Site (0=A,1=B) dfree Returned to Drug Use

    (1=remained 0=otherwise)

    1. A careful univariable analysis of each independent variable- Univariable logistric regression (y=0,1)

    - nominal , ordinal Scale univariablelogistic regression Wald test, likelihood ratio likelihood ratio Chi-Square, Pearson Chi-Square

    - continuous univariable logistic regression Wald test, likelihood ratio test t-test- (clinically biological meaningful) /- Univariable analysis : crude analysis p-value chi2 = .

    Log likelihood = -326.86446 Pseudo R2 = -0.0000------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    _cons | -1.068691 .095599 -11.18 0.000 -1.256061 -.88132------------------------------------------------------------------------------

    . estimates store A

    . logit dfree age Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -326.16602Iteration 2: log likelihood = -326.16544Logistic regression Number of obs = 575

    LR chi2(1) = 1.40Prob > chi2 = 0.2371

    Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .0181723 .015344 1.18 0.236 -.0119014 .048246_cons | -1.660226 .5110844 -3.25 0.001 -2.661933 -.6585194

    ------------------------------------------------------------------------------

  • 9

    . logit dfree age, or

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 1.40Prob > chi2 = 0.2371

    Log likelihood = -326.16544 Pseudo R2 = 0.0021------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | 1.018338 .0156254 1.18 0.236 .9881691 1.049429------------------------------------------------------------------------------

    . estimates store B

    . lrtest A B Likelihood-ratio test LR chi2(1) = 1.40(Assumption: A nested in B) Prob > chi2 = 0.2371

    . lincom 10*age,or( 1) 10 age = 0------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    (1) | 1.199282 .184018 1.18 0.236 .887795 1.620055------------------------------------------------------------------------------

    *** odds ratio for a 10 point increase in BECK

    . logit dfree beckLogistic regression Number of obs = 575

    LR chi2(1) = 0.64Prob > chi2 = 0.4250

    Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    beck | -.008225 .0103428 -0.80 0.426 -.0284965 .0120464_cons | -.9272829 .2003166 -4.63 0.000 -1.319896 -.5346696

    ------------------------------------------------------------------------------

    . estimates store C

    . lrtest A C

    Likelihood-ratio test LR chi2(1) = 0.64(Assumption: A nested in C) Prob > chi2 = 0.4250

    . logit dfree beck, orLogistic regression Number of obs = 575

    LR chi2(1) = 0.64Prob > chi2 = 0.4250

    Log likelihood = -326.54621 Pseudo R2 = 0.0010------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    beck | .9918087 .010258 -0.80 0.426 .9719057 1.012119------------------------------------------------------------------------------

    . lincom 5*beck,or

    ( 1) 5 beck = 0

    ------------------------------------------------------------------------------dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------(1) | .959709 .0496302 -0.80 0.426 .8672027 1.062083

    ------------------------------------------------------------------------------5555

    *** odds ratio for a 5 point increase in BECK

    . logit dfree ndrugtxLogistic regression Number of obs = 575

    LR chi2(1) = 11.84Prob > chi2 = 0.0006

    Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    ndrugtx | -.0749582 .024681 -3.04 0.002 -.123332 -.0265844_cons | -.7677805 .130326 -5.89 0.000 -1.023215 -.5123462

    ------------------------------------------------------------------------------

    . logit dfree ndrugtx, or

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 11.84Prob > chi2 = 0.0006

    Log likelihood = -320.94485 Pseudo R2 = 0.0181------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    ndrugtx | .9277822 .0228986 -3.04 0.002 .8839701 .9737658------------------------------------------------------------------------------. estimates store D. lrtest A DLikelihood-ratio test LR chi2(1) = 11.84(Assumption: A nested in D) Prob > chi2 = 0.0006

    . xi:logit dfree i.ivhxi.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Logistic regression Number of obs = 575

    LR chi2(2) = 13.35Prob > chi2 = 0.0013

    Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    _Iivhx_2 | -.4810199 .2657063 -1.81 0.070 -1.001795 .0397548_Iivhx_3 | -.7748382 .2165765 -3.58 0.000 -1.19932 -.3503561

    _cons | -.6797242 .1417395 -4.80 0.000 -.9575285 -.4019198------------------------------------------------------------------------------

    . xi:logit dfree i.ivhx, ori.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575

    LR chi2(2) = 13.35Prob > chi2 = 0.0013

    Log likelihood = -320.18821 Pseudo R2 = 0.0204------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    _Iivhx_2 | .6181526 .164247 -1.81 0.070 .3672198 1.040556_Iivhx_3 | .4607783 .0997937 -3.58 0.000 .301399 .7044372

    ------------------------------------------------------------------------------. estimates store E. lrtest A ELikelihood-ratio test LR chi2(2) = 13.35(Assumption: A nested in E) Prob > chi2 = 0.0013

    . logit dfree race

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 4.62Prob > chi2 = 0.0315

    Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    race | .4591026 .2109763 2.18 0.030 .0455967 .8726085_cons | -1.193922 .1141504 -10.46 0.000 -1.417653 -.9701919

    ------------------------------------------------------------------------------

    . logit dfree race, or

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 4.62Prob > chi2 = 0.0315

    Log likelihood = -324.55269 Pseudo R2 = 0.0071------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    race | 1.582653 .3339022 2.18 0.030 1.046652 2.393145------------------------------------------------------------------------------

    . estimates store F

    . lrtest A F

    Likelihood-ratio test LR chi2(1) = 4.62(Assumption: A nested in F) Prob > chi2 = 0.0315

  • 10

    . logit dfree treat

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 5.18Prob > chi2 = 0.0229

    Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    treat | .437162 .1930633 2.26 0.024 .0587649 .8155591_cons | -1.297816 .143296 -9.06 0.000 -1.578671 -1.016961

    ------------------------------------------------------------------------------

    . logit dfree treat, or

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 5.18Prob > chi2 = 0.0229

    Log likelihood = -324.27534 Pseudo R2 = 0.0079------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    treat | 1.548307 .2989212 2.26 0.024 1.060526 2.260439------------------------------------------------------------------------------

    . estimates store G

    . lrtest A G

    Likelihood-ratio test LR chi2(1) = 5.18(Assumption: A nested in G) Prob > chi2 = 0.0229

    . logit dfree site

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 1.67Prob > chi2 = 0.1968

    Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    site | .2642236 .2034167 1.30 0.194 -.1344658 .662913_cons | -1.15268 .1170732 -9.85 0.000 -1.382139 -.9232202

    ------------------------------------------------------------------------------

    . logit dfree site, or

    ...Logistic regression Number of obs = 575

    LR chi2(1) = 1.67Prob > chi2 = 0.1968

    Log likelihood = -326.0315 Pseudo R2 = 0.0025------------------------------------------------------------------------------

    dfree | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    site | 1.302419 .2649338 1.30 0.194 .8741828 1.940437------------------------------------------------------------------------------

    . estimates store H

    . lrtest A H

    Likelihood-ratio test LR chi2(1) = 1.67(Assumption: A nested in H) Prob > chi2 = 0.1968

    simple logistic regression

    0.1971.670.87, 1.941.3020.20340.264site0.02295.181.06, 2.261.548 0.1931 0.437 treat0.03154.621.05, 2.391.583 0.2109 0.459 race

    0.30, 0.700.460 0.2166 -0.775 ivhx30.0013 13.350.37, 1.040.618 0.2657 -0.481 ivhx20.000611.840.88, 0.970.9280.0247-0.075ndrgtx0.42500.640.97, 1.010.9920.0103-0.008beck0.23711.400.99, 1.051.0180.0153 0.018 age

    p valuelikelihood ratio95%CIorse

    beck p value 0.426 beck

    - p value > 0.25 "delta-beta-hat-percent"

    - p value > 0.25

    2. Fit of the multivariable model

    100)(% x

    Full

    Fullreduce

    . use "K:\hosmer_data\logistic\uis.dta", clear

    . xi:logit dfree age ndrugtx i.ivhx race treat sitei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -310.17928Iteration 2: log likelihood = -309.62871Iteration 3: log likelihood = -309.62413Iteration 4: log likelihood = -309.62413Logistic regression Number of obs = 575

    LR chi2(7) = 34.48Prob > chi2 = 0.0000

    Log likelihood = -309.62413 Pseudo R2 = 0.0527------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .0503708 .0173224 2.91 0.004 .0164196 .084322ndrugtx | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761

    _Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662

    race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125

    _cons | -2.405405 .5548058 -4.34 0.000 -3.492805 -1.318006------------------------------------------------------------------------------

    - p value Ward - race p value 0.311- site p value 0.494- race site p value > 0.25

    * p value ward likelihood ratio

  • 11

    - (confound)

    - - Hosmer et al. "delta-beta-hat-percent" ,

    suggest a significant change >20%.

    - purposeful selection "delta-beta-hat-percent"100

    %)(

    mod

    modmod xtCoefficieninChangeelfull

    elfullelreduce

    100)(mod

    modmod xEE

    EEEEEEratiooddsinChange

    elfull

    elfullelreduce

    *Kleinbaum, Kupper, Morgenstern (1982) ; Greenland,1989; Mickey & Greenland, 1989) change in effect estimates odds ratio 10%

    main effect

    . xi:logit dfree age ndrugtx i.ivhx race treati.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)...Logistic regression Number of obs = 575

    LR chi2(6) = 34.02Prob > chi2 = 0.0000

    Log likelihood = -309.8567 Pseudo R2 = 0.0520------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .0509605 .017309 2.94 0.003 .0170354 .0848856ndrugtx | -.0631998 .0256525 -2.46 0.014 -.1134778 -.0129219

    _Iivhx_2 | -.5928725 .2864333 -2.07 0.038 -1.154272 -.0314735_Iivhx_3 | -.7600441 .2489941 -3.05 0.002 -1.248064 -.2720245

    race | .2081089 .221453 0.94 0.347 -.2259309 .6421488treat | .438959 .1991429 2.20 0.028 .0486461 .829272_cons | -2.355786 .5501049 -4.28 0.000 -3.433972 -1.2776

    ------------------------------------------------------------------------------

    - site - ( site )

    -0.800920.438960.44250treat--0.14859site

    -7.969150.208110.22613race3.72885-0.76004-0.73272_Iivhx_3

    -1.73323-0.59287-0.60333_Iivhx_22.74369-0.06320-0.06151ndrugtx1.170720.050960.05037age

    Delta beta hat (%)Reduce modelFull modelVariables

    100

    )(%

    mod

    modmod xhatBetaDeltaelfull

    elfullelreduce

    - chi2 = 0.0000

    Log likelihood = -309.6238 Pseudo R2 = 0.0527------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .0504143 .0174058 2.90 0.004 .0162995 .084529ndrugtx | -.0615329 .0256457 -2.40 0.016 -.1117975 -.0112682

    _Iivhx_2 | -.6036962 .2875987 -2.10 0.036 -1.167379 -.0400131_Iivhx_3 | -.7336591 .2549904 -2.88 0.004 -1.233431 -.2338871

    race | .2260262 .2233692 1.01 0.312 -.2117694 .6638218treat | .4424802 .1992933 2.22 0.026 .0518725 .833088site | .1489209 .2176073 0.68 0.494 -.2775816 .5754234beck | .0002759 .0107983 0.03 0.980 -.0208883 .0214402

    _cons | -2.411128 .5983465 -4.03 0.000 -3.583866 -1.238391------------------------------------------------------------------------------

    4: Add each variable not selected in Step 1 to the model

    linearity logit continuous Smoothed scatter plots

    Plot Smoothed logit and continuous variable design variables

    Plot Coefficient and continuous variable continuous variable 4 quartile

    Fractional polynomials Spline funcion

    5. examine more closely the variables in the model. do "G:\hosmer_data\logistic\plot_smooth_logit_age.do". lowess dfree age, gen(var3) logit nodraw. graph twoway line var3 age, sort xlabel(20(10)50 56)

    -Plot Smoothed logit and continuous variable

  • 12

    -Plot Coefficient and continuous variable continuous variable 4 quartile.xtile age1 = age, nq(4).tabstat age, statistics(median ) by(age1) columns(variables)

    Summary statistics: p50by categories of: age1 (4 quantiles of age)

    age1 | age---------+----------

    1 | 252 | 303 | 354 | 40

    ---------+----------Total | 32

    --------------------

    . xi: logit dfree i.age1 ndrugtx i.ivhx race treat site

    ...Logistic regression Number of obs = 575

    LR chi2(9) = 34.69Prob > chi2 = 0.0001

    Log likelihood = -309.52103 Pseudo R2 = 0.0531

    ------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------_Iage1_2 | -.165864 .2909137 -0.57 0.569 -.7360444 .4043163_Iage1_3 | .4693399 .27066 1.73 0.083 -.0611439 .9998237_Iage1_4 | .595771 .3124964 1.91 0.057 -.0167108 1.208253ndrugtx | -.0587551 .0254688 -2.31 0.021 -.108673 -.0088371

    _Iivhx_2 | -.5545193 .2853626 -1.94 0.052 -1.11382 .0047811_Iivhx_3 | -.6725536 .2518601 -2.67 0.008 -1.16619 -.1789169

    race | .2787172 .2238499 1.25 0.213 -.1600205 .7174549treat | .4430577 .2000427 2.21 0.027 .0509812 .8351343site | .1582001 .2188293 0.72 0.470 -.2706974 .5870976

    _cons | -1.054837 .2705875 -3.90 0.000 -1.585179 -.5244956------------------------------------------------------------------------------

    .clear

    .input age coefage coef

    1. 25 02. 30 -.1658643. 35 .46933994. 40 .5957715. end

    .graph twoway scatter coef age, connect(l) ylabel(-.25(.25).75) xlabel(20(10)50) yline(0)

    fractional polynomial- Fractional Polynomial (outcome) Royston & Altman (1994)

    - linearity (non-linearity) (power)

    - first-order fractional polynomial fp1 etc.

    fractional polynomial- x xp p -2, -1, -0.5, 0, 0.5, 1, 2, 3 - p=0 xp log x 8

    - second-order fractional polynomial fp2 x xp p 72

    33100.5-1320.500-12200-0.5-1313-0.5-1-1212-0.53-23111-0.52-2230.50.5-0.51-2120.50-0.50.5-20.510.5-0.5-0.50-20

    0.50.53-1-0.5-2-0.5302-1-1-2-1201-1-2-2-2p2P1P2p1p2p1p

    powerpowerpowerPowerFP2FP1

    Power of First & second-order

    fractional polynomial

    First order (FP1) p=8Second order (FP2) p=72

  • 13

    Fractional Polynomial (Scaling) (center)- Fractional Polynomial (transform) (Scaling) (center)

    Fractional Polynomial (Scaling) - STATA lrange = log10[max(x) - min(x)] scale = 10sign(lrange)int(|lrange|) x = x/scale Fractional Polynomial (center)- - FP1

    *1x

    *1x1x

    11 xy oi )*)( 1*1

    *1

    *0

    ppi xxy

    n

    ix

    nx

    1 11*

    fractional polynomial- Deviance

    - Deviance Chi-Square df= df(model2)-df(model1)

    )},()({2),(,( 211211 ppLpLpppG

    )}()1({2),1( 11 pLLpG (df=1)

    (df=2)

    fractional polynomial- Deviance

    - Deviance Chi-Square df = df(model 2)-df(model 1)

    - fractional polynomial cut point Median, Quartile Heinzl H.,2000; Royston P., Altman D.G., Sauerbrei W., 2006)

    Fractional polynomial. use "H:\hosmer_data\logistic\uis.dta", clear. xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(2)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)

    (where: X = age/10)Iteration 0: log likelihood = -326.86446Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575

    LR chi2(8) = 34.96Prob > chi2 = 0.0000

    Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293

    Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014

    race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846

    _cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.

    1. power=1 age linear age age (p-value=.003;df=1-0)

    2. age (age-2 age3) significant (Dev. dif.=619.248-618.769= 0.480; p-value=0.923, df=4-1)

    3. age3 (age-2 age3) significant (Dev. dif.=618.882-618.769=0.133,p-value=0.945;df=4-2)

    First order (FP1) Second order (FP2)

    Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model

    . di chiprob(4-1,619.248-618.769)

    .9234802

    . di chiprob(4-2,618.882-618.769)

    .94506648

  • 14

    linear model Fractional polynomial

    G(1,(p1, p2) = -2{L(1) - L(p1, p2)}= 619.248 - 618.769 = 0.480; p-value = 0.923

    linear model

    Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model

    G(1,p1) = -2{L(1) - L(p1)}=619.248-618.882=.366;p-value=.545

    STATA10

    First order m=1 (FP1) Second order m=2 (FP2)

    . use "I:\hosmer_data\logistic\uis.dta", clear

    . xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site,degree(1)comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)-> gen double Iage__1 = X^3-33.95748331 if e(sample)

    (where: X = age/10)Iteration 0: log likelihood = -326.86446...------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    Iage__1 | .0138939 .0046486 2.99 0.003 .0047829 .023005Indru__1 | -.0620649 .0257325 -2.41 0.016 -.1124997 -.0116301_Iivhx_2 | -.5960999 .2868616 -2.08 0.038 -1.158338 -.0338615_Iivhx_3 | -.714141 .2499592 -2.86 0.004 -1.204052 -.22423

    race | .2355037 .2230028 1.06 0.291 -.2015736 .6725811treat | .4348659 .1992503 2.18 0.029 .0443425 .8253893site | .1436801 .2173756 0.66 0.509 -.2823683 .5697285

    _cons | -1.113293 .2236989 -4.98 0.000 -1.551734 -.6748509------------------------------------------------------------------------------Deviance: 618.88. Best powers of age among 8 models fit: 3.Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 8.918 0.012Linear 1 619.248 0.366 0.545 1m = 1 2 618.882 -- -- 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 1 model

    . di chiprob(2-1,619.248-618.882)

    .54519273

    . xi:fracpoly logit dfree age ndrugtx i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Indru__1 = ndrugtx-4.542608696 if e(sample)........-> gen double Iage__1 = X^-2-.0953622163 if e(sample)-> gen double Iage__2 = X^3-33.95748331 if e(sample)

    (where: X = age/10)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -309.95259Iteration 2: log likelihood = -309.38924Iteration 3: log likelihood = -309.38436Iteration 4: log likelihood = -309.38436Logistic regression Number of obs = 575

    LR chi2(8) = 34.96Prob > chi2 = 0.0000

    Log likelihood = -309.38436 Pseudo R2 = 0.0535------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    Iage__1 | -1.538626 4.575934 -0.34 0.737 -10.50729 7.43004Iage__2 | .0116581 .0080977 1.44 0.150 -.0042132 .0275293

    Indru__1 | -.0620596 .0257223 -2.41 0.016 -.1124744 -.0116447_Iivhx_2 | -.6057376 .2881578 -2.10 0.036 -1.170517 -.0409587_Iivhx_3 | -.7263554 .2525832 -2.88 0.004 -1.221409 -.2313014

    race | .2282107 .224089 1.02 0.308 -.2109957 .6674171treat | .4392589 .1996983 2.20 0.028 .0478573 .8306604site | .1459101 .217491 0.67 0.502 -.2803644 .5721846

    _cons | -1.082342 .2416317 -4.48 0.000 -1.555931 -.6087524------------------------------------------------------------------------------Deviance: 618.77. Best powers of age among 44 models fit: -2 3.

    STATA10Fractional polynomial model comparisons:---------------------------------------------------------------age df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 627.801 9.032 0.060Linear 1 619.248 0.480 0.923 1m = 1 2 618.882 0.114 0.945 3m = 2 4 618.769 -- -- -2 3---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model

    ndrugtx. lowess dfree ndrugtx , gen(var2) logit nodraw. graph twoway line var2 ndrugtx, sort xlabel(20(10)50 56)

    . xi:fracpoly logit dfree ndrugtx age i.ivhx race treat site, degree(2) comparei.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)-> gen double Iage__1 = age-32.3826087 if e(sample)........-> gen double Indru__1 = X^-1-1.804204581 if e(sample)-> gen double Indru__2 = X^-1*ln(X)+1.064696882 if e(sample)

    (where: X = (ndrugtx+1)/10)Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -307.22312Iteration 2: log likelihood = -306.72663Iteration 3: log likelihood = -306.72558Logistic regression Number of obs = 575

    LR chi2(8) = 40.28Prob > chi2 = 0.0000

    Log likelihood = -306.72558 Pseudo R2 = 0.0616------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    Indru__1 | .981453 .2888474 3.40 0.001 .4153226 1.547583Indru__2 | .3611251 .1098589 3.29 0.001 .1458057 .5764445Iage__1 | .0544455 .0174877 3.11 0.002 .0201703 .0887208

    _Iivhx_2 | -.6088269 .2911064 -2.09 0.036 -1.179385 -.0382689_Iivhx_3 | -.7238122 .2555643 -2.83 0.005 -1.224709 -.2229154

    race | .2477026 .2242152 1.10 0.269 -.1917512 .6871564treat | .4223666 .200365 2.11 0.035 .0296584 .8150748site | .1732142 .2209758 0.78 0.433 -.2598905 .6063189

    _cons | -1.164471 .2454818 -4.74 0.000 -1.645607 -.6833356------------------------------------------------------------------------------Deviance: 613.45. Best powers of ndrugtx among 44 models fit: -1 -1.

    ndrugtx

  • 15

    Fractional polynomial model comparisons:---------------------------------------------------------------ndrugtx df Deviance Dev. dif. P (*) Powers---------------------------------------------------------------Not in model 0 626.176 12.725 0.013Linear 1 619.248 5.797 0.122 1m = 1 2 618.818 5.367 0.068 .5m = 2 4 613.451 -- -- -1 -1---------------------------------------------------------------(*) P-value from deviance difference comparing reported model with m = 2 model

    Fractional polynomial model comparisons:---------------------------------------------------------------ndrugtx df Deviance Gain P(term) Powers---------------------------------------------------------------Not in model 0 626.176 -- --Linear 1 619.248 0.000 0.008 1m = 1 2 618.818 0.430 0.512 .5m = 2 4 613.451 5.797 0.068 -1 -1---------------------------------------------------------------

    G(1,(p1, p2)) = -2{L(1) - L(p1, p2)}G = 619.248 - 613.451 = 5.797; p-value=0.122

    Linear Model ?

    STATA 8

    . xi:mfp logit dfree ndrugtx age i.ivhx race treat site i.ivhx _Iivhx_1-3 (naturally coded; _Iivhx_1 omitted)

    Deviance for model with all terms untransformed = 619.248, 575 observations

    Variable Model (vs.) Deviance Dev diff. P Powers (vs.)----------------------------------------------------------------------age lin. FP2 619.248 0.480 0.923 1 -2 3

    Final 619.248 1

    [_Iivhx_3 included with 1 df in model]

    ndrugtx lin. FP2 619.248 5.797 0.122 1 -1 -1Final 619.248 1

    [treat included with 1 df in model]

    [_Iivhx_2 included with 1 df in model]

    [race included with 1 df in model]

    [site included with 1 df in model]

    Multivariate Fractional Multinomial (mfp)

    Fractional polynomial fitting algorithm converged after 1 cycle.

    Transformations of covariates:

    -> gen double Indru__1 = ndrugtx-4.542608696 if e(sample) -> gen double Iage__1 = age-32.3826087 if e(sample)

    Final multivariable fractional polynomial model for dfree--------------------------------------------------------------------

    Variable | -----Initial----- -----Final-----| df Select Alpha Status df Powers

    -------------+------------------------------------------------------ndrugtx | 4 1.0000 0.0500 in 1 1

    age | 4 1.0000 0.0500 in 1 1_Iivhx_2 | 1 1.0000 0.0500 in 1 1_Iivhx_3 | 1 1.0000 0.0500 in 1 1

    race | 1 1.0000 0.0500 in 1 1treat | 1 1.0000 0.0500 in 1 1site | 1 1.0000 0.0500 in 1 1

    --------------------------------------------------------------------

    Logistic regression Number of obs = 575LR chi2(7) = 34.48Prob > chi2 = 0.0000

    Log likelihood = -309.62413 Pseudo R2 = 0.0527

    ------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------Indru__1 | -.0615121 .0256311 -2.40 0.016 -.1117481 -.0112761Iage__1 | .0503708 .0173224 2.91 0.004 .0164196 .084322

    _Iivhx_2 | -.6033296 .2872511 -2.10 0.036 -1.166331 -.0403278_Iivhx_3 | -.732722 .252329 -2.90 0.004 -1.227278 -.2381662

    race | .2261295 .2233399 1.01 0.311 -.2116087 .6638677treat | .4425031 .1992909 2.22 0.026 .0519002 .8331061site | .1485845 .2172121 0.68 0.494 -.2771434 .5743125

    _cons | -1.053693 .2264488 -4.65 0.000 -1.497524 -.6098613------------------------------------------------------------------------------Deviance: 619.248.

    6. Check for interactions among the variables in the model.- interaction order from order Hierarchically Well-formated Model: HWL

    - third order term logit P(X) = x1 + x2 + x3 + x1*x2+x1*x3+ x2*x3 + x1*x2*x3

    logit P(X) = x1 + x2 + x3 + x2*x3 + x1*x2*x3 ()

    Interaction assessment- Wald Statistics Likelihood ratio test

    )ln(ln2

    )ln2()ln2()(

    :

    fullreduced

    fullreduced

    i

    ij

    RLRLRLRtestLR

    seZtestWald

    001

    00

    21122

    :;: HH

    xxxxy

  • 16

    . gen tage= treat* age

    . logit dfree treat age tage

    Iteration 0: log likelihood = -326.86446Iteration 1: log likelihood = -322.31165Iteration 2: log likelihood = -322.26464Iteration 3: log likelihood = -322.26464

    Logistic regression Number of obs = 575LR chi2(3) = 9.20Prob > chi2 = 0.0268

    Log likelihood = -322.26464 Pseudo R2 = 0.0141

    ------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------treat | -1.123388 1.042136 -1.08 0.281 -3.165936 .9191606

    age | -.0077915 .0238604 -0.33 0.744 -.0545571 .0389741tage | .0480969 .0314183 1.53 0.126 -.0134819 .1096756

    _cons | -1.043996 .7884888 -1.32 0.185 -2.589406 .5014138------------------------------------------------------------------------------

    . qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3

    . di "Log likelihood = " e(ll)Log likelihood = -306.72558. estimates store A. logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 age_ndrugfp2, nolog

    Logistic regression Number of obs = 575LR chi2(10) = 48.07Prob > chi2 = 0.0000

    Log likelihood = -302.83141 Pseudo R2 = 0.0735------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .1330456 .0699532 1.90 0.057 -.0040603 .2701514ndrugfp1 | 2.113401 1.542559 1.37 0.171 -.9099592 5.136761ndrugfp2 | .6035234 .5850525 1.03 0.302 -.5431584 1.750205

    race | .3071997 .2277417 1.35 0.177 -.1391657 .7535652treat | .398666 .201887 1.97 0.048 .0029748 .7943572site | .1678239 .2226292 0.75 0.451 -.2685213 .6041691

    _Iivhx_2 | -.5460554 .2947045 -1.85 0.064 -1.123666 .0315548_Iivhx_3 | -.7156675 .2607849 -2.74 0.006 -1.226796 -.2045385

    age_ndrugfp1 | -.0285781 .0445662 -0.64 0.521 -.1159261 .05877age_ndrugfp2 | -.0050124 .017133 -0.29 0.770 -.0385924 .0285675

    _cons | -7.251346 2.516295 -2.88 0.004 -12.18319 -2.319499------------------------------------------------------------------------------. di "Log likelihood = " e(ll)Log likelihood = -302.83141. estimates store A1. lrtest A A1Likelihood-ratio test LR chi2(2) = 7.79(Assumption: A nested in A1) Prob > chi2 = 0.0204

    . qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3ageivhx2 ageivhx3

    . di "Log likelihood = " e(ll)Log likelihood = -306.35593

    . estimates store A2

    . lrtest A A2

    Likelihood-ratio test LR chi2(2) = 0.74(Assumption: A nested in A2) Prob > chi2 = 0.6910...

    . qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 agetreat

    . di "Log likelihood = " e(ll)Log likelihood = -305.34312. estimates store A3. lrtest A A3

    Likelihood-ratio test LR chi2(1) = 2.76(Assumption: A nested in A3) Prob > chi2 = 0.0964

    . qui logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 racesite

    . di "Log likelihood = " e(ll)Log likelihood = -302.45334. estimates store A4. lrtest A A4 Likelihood-ratio test LR chi2(1) = 8.54(Assumption: A nested in A4) Prob > chi2 = 0.0035

    0.126024.14-304.65412ndrugtx* x race0.230522.94-305.25796ndrugtx* x treat0.99832.003-306.72386ndrugtx* x site

    0.245745.43-304.00917ndrugx* xIivhx*0.206211.60-305.92657age x site0.096412.76-305.34312age x treat0.656910.20-306.6269age x race0.691020.74-306.35593age x _Iivhx*0.020427.79-302.83141age x ndrugx*

    -309.62413 main effectP valuedfGLog likelihoodinteraction

    ndrugtx* ndrugfp1 = ((ndrugtx+1)/10)^(-1); ndrugfp2 = ndrugfp1*log((ndrugtx+1)/10)

    0.648520.87-306.29100Site x _ivhx*

    0.410821.78-305.83605race x _ivhx*

    0.854310.03-306.70871treat x site

    0.979820.04-306.70513treat x _Iivhx*

    0.003518.54-302.45334race x site0.331510.94-306.70871race x treatP valuedfGLog likelihoodinteraction

    interaction p-value 0.10 age x ndrugx*, age x treat, race x site

    . logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 age_ndrugfp2 agetreat racesite

    Logistic regression Number of obs = 575

    LR chi2(12) = 58.56Prob > chi2 = 0.0000

    Log likelihood = -297.58223 Pseudo R2 = 0.0896------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .1226443 .0745964 1.64 0.100 -.0235621 .2688506ndrugfp1 | 2.48771 1.596674 1.56 0.119 -.6417143 5.617134ndrugfp2 | .7448356 .6043055 1.23 0.218 -.4395815 1.929253

    race | .6996389 .2667644 2.62 0.009 .1767903 1.222488treat | -1.274051 1.07897 -1.18 0.238 -3.388793 .840692site | .4977606 .2563434 1.94 0.052 -.0046632 1.000184

    _Iivhx_2 | -.6243653 .2996152 -2.08 0.037 -1.2116 -.0371303_Iivhx_3 | -.6905352 .2627414 -2.63 0.009 -1.205499 -.1755716

    age_ndrugfp1 | -.0387742 .046132 -0.84 0.401 -.1291912 .0516428age_ndrugfp2 | -.0090046 .017702 -0.51 0.611 -.0436998 .0256906

    agetreat | .0520701 .0324382 1.61 0.108 -.0115076 .1156477racesite | -1.416875 .5318186 -2.66 0.008 -2.459221 -.3745301

    _cons | -7.141237 2.666019 -2.68 0.007 -12.36654 -1.915936------------------------------------------------------------------------------

    age x ndrugfp2 ward -0.51 p value = 0.611

  • 17

    . logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 agetreat racesite

    ...

    Logistic regression Number of obs = 575LR chi2(11) = 58.31Prob > chi2 = 0.0000

    Log likelihood = -297.71139 Pseudo R2 = 0.0892------------------------------------------------------------------------------

    dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    age | .0889238 .0339956 2.62 0.009 .0222937 .1555539ndrugfp1 | 1.705601 .4106322 4.15 0.000 .9007769 2.510426ndrugfp2 | .4440587 .1175928 3.78 0.000 .213581 .6745364

    race | .6869266 .265402 2.59 0.010 .1667483 1.207105treat | -1.252787 1.080874 -1.16 0.246 -3.371262 .8656875site | .4903829 .2560081 1.92 0.055 -.0113838 .9921497

    _Iivhx_2 | -.6299072 .2994363 -2.10 0.035 -1.216792 -.0430227_Iivhx_3 | -.694879 .262544 -2.65 0.008 -1.209456 -.1803021

    age_ndrugfp1 | -.0155328 .0060924 -2.55 0.011 -.0274737 -.0035918agetreat | .0515973 .0325362 1.59 0.113 -.0121726 .1153672racesite | -1.401606 .5309161 -2.64 0.008 -2.442183 -.3610301

    _cons | -5.976921 1.338859 -4.46 0.000 -8.601036 -3.352807------------------------------------------------------------------------------

    agetreat wald 1.59 p value = 0.113

    . logit dfree age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesite

    Logistic regression Number of obs = 575

    LR chi2(10) = 55.77Prob > chi2 = 0.0000

    Log likelihood = -298.98146 Pseudo R2 = 0.0853

    ------------------------------------------------------------------------------dfree | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------age | .1166385 .0288749 4.04 0.000 .0600446 .1732323

    ndrugfp1 | 1.669035 .407152 4.10 0.000 .871032 2.467038ndrugfp2 | .4336886 .1169052 3.71 0.000 .2045586 .6628185

    race | .6841068 .2641355 2.59 0.010 .1664107 1.201803treat | .4349255 .2037596 2.13 0.033 .035564 .834287site | .516201 .2548881 2.03 0.043 .0166295 1.015773

    _Iivhx_2 | -.6346307 .2987192 -2.12 0.034 -1.220109 -.0491518_Iivhx_3 | -.7049475 .2615805 -2.69 0.007 -1.217636 -.1922591

    age_ndrugfp1 | -.0152697 .0060268 -2.53 0.011 -.0270819 -.0034575racesite | -1.429457 .5297806 -2.70 0.007 -2.467808 -.3911062

    _cons | -6.843864 1.219316 -5.61 0.000 -9.23368 -4.454048------------------------------------------------------------------------------

    age x _ndrugfp1 wald -2.53 p value = 0.011 interaction 0.10

    logistic regression- (collinearity, multi-collinearity) coefficient

    Ridge logistic regression, , -multiple testing-influential observation (outliers)-Problem of perfect or complete separation

    * (Collinearity) (r2 > 0.90; r > 0.95 Kleinbaum, Muller, Nizam; 1998, 241)

    / R2

    Standard error t, z

    * , 2552

    . twocat 98 1 1 98

    . logit y x1 x2 x3------------------------------------------------------------------------------

    y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    x1 | .2995896 1.429618 0.21 0.834 -2.502411 3.10159x2 | -.0143819 1.429593 -0.01 0.992 -2.816334 2.78757x3 | .3139715 .2886275 1.09 0.277 -.2517281 .8796711

    _cons | -.3670144 .2425088 -1.51 0.130 -.8423228 .1082941------------------------------------------------------------------------------

    . logit y x1 x3------------------------------------------------------------------------------

    y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    x1 | .2854983 .2861025 1.00 0.318 -.2752523 .8462489x3 | .3136786 .2871556 1.09 0.275 -.249136 .8764931

    _cons | -.3670266 .2425058 -1.51 0.130 -.8423293 .1082761------------------------------------------------------------------------------

    . logit y x2 x3------------------------------------------------------------------------------

    y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    x2 | .279187 .2860666 0.98 0.329 -.2814933 .8398672x3 | .3079032 .2871195 1.07 0.284 -.2548407 .8706471

    _cons | -.3612278 .2408548 -1.50 0.134 -.8332944 .1108389------------------------------------------------------------------------------

    . corr x1 x2 x3(obs=198)

    | x1 x2 x3-------------+---------------------------

    x1 | 1.0000x2 | 0.9798 1.0000x3 | 0.0000 0.0203 1.0000

    . collin x1 x2 x3Collinearity Diagnostics

    SQRT R-Variable VIF VIF Tolerance Squared

    ----------------------------------------------------x1 25.25 5.03 0.0396 0.9604x2 25.26 5.03 0.0396 0.9604x3 1.01 1.01 0.9897 0.0103

    ----------------------------------------------------Mean VIF 17.17

    CondEigenval Index

    ---------------------------------1 3.0422 1.00002 0.6908 2.09853 0.2570 3.44084 0.0100 17.4460

    ---------------------------------Condition Number 17.4460Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0396

  • 18

    collinearity multicollinearityPearson Correlation (informal method)- Pearson correlation . corr age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesite

    (obs=575)| age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_nd~1 racesite

    -------------+------------------------------------------------------------------------------------------age | 1.0000

    ndrugfp1 | -0.1836 1.0000ndrugfp2 | 0.1601 -0.9916 1.0000

    race | 0.0139 0.0874 -0.0821 1.0000treat | -0.0446 0.0251 -0.0204 0.0791 1.0000site | -0.0287 0.1923 -0.1926 -0.0795 -0.0230 1.0000

    _Iivhx_2 | 0.1063 -0.0551 0.0567 -0.0152 0.0513 0.1623 1.0000_Iivhx_3 | 0.2674 -0.3045 0.2843 -0.1806 -0.0695 -0.2292 -0.4138 1.0000

    age_ndrugfp1 | 0.0462 0.9546 -0.9475 0.1080 0.0108 0.1833 -0.0134 -0.2506 1.0000racesite | 0.0430 0.1831 -0.1834 0.4384 0.0522 0.3849 -0.0303 -0.1295 0.2055 1.0000

    Variance Inflation Factors (VIF: formal method) VIF > 10 VIF 1 multicolinearity. collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics

    SQRT R-Variable VIF VIF Tolerance Squared

    ----------------------------------------------------age 2.64 1.63 0.3782 0.6218

    ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843

    race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910

    _Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939

    age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891

    ----------------------------------------------------Mean VIF 20.82

    - Generalized Variance inflaction factor (GVIF)VIF ?

    r r2 vif.1 0.01 1.01 .2 0.04 1.04 .3 0.09 1.10 .4 0.16 1.19 .5 0.25 1.33 .6 0.36 1.56 .7 0.49 1.96 .8 0.64 2.78 .9 0.81 5.26

    .91 0.83 5.82

    .92 0.85 6.51

    .93 0.86 7.40

    .94 0.88 8.59

    .95 0.90 10.26

    .96 0.92 12.76

    .97 0.94 16.92

    .98 0.96 25.25

    .99 0.98 50.25 1 1.00 .

    VIF vs correlation

    .95

    Variance inflation factors-

    inflated

    1-p

    1-p

    1iKVIF

    VIF

    2i

    R1

    11)2i

    R(1i

    VIF

    )2iR(1itolerance

    Indication of Multicollinearity Variance inflation factors*- VIF > 10 indication that Multicollinearity- Mean VIF provides information about the severity of the

    multicollinearity- if Mean VIF > 1 are indicative of serious multicollinearity

    problems*Neter, Wasserman, Kutner (1987; p.392)Marquardt (1970); Belsley, Kuh & Welsch (1980)- tolerence 5 or 10+ (OBrien, 2007)Statacollin [varlist]estat vif variance inflation factors for the

    independent variables

    Conditional Index & Variance Decomposition Proportion Conditional Index (CI) Variance Decomposition Proportion (VDP) eigenvalue Conditional Index

    Conditional Index 10-30 conditional index > 30 Conditional Index > 100 (Belsley, 1991a)

    between 10 and 30, there is moderate to strong multicollinearity and if it exceeds 30 there is severe multicollinearity. (Gujarati, 2002)

    Eigenvaluek MinMax ;/

  • 19

    Conditional Index & Variance Decomposition Proportion Variance Decomposition Proportion

    Belsley et al. (1980) Belsley (1991a) VDP 0.5

    (proposed calculation of the proportions of variance) (principal component)

    (decomposition of the coefficient variance for each dimension)

    kj

    jkjk VIF

    Vp

    2

    (Fox,1984)

    . collin age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1 racesiteCollinearity Diagnostics

    SQRT R-Variable VIF VIF Tolerance Squared

    ----------------------------------------------------age 2.64 1.63 0.3782 0.6218

    ndrugfp1 105.68 10.28 0.0095 0.9905ndrugfp2 63.77 7.99 0.0157 0.9843

    race 1.43 1.20 0.6969 0.3031treat 1.02 1.01 0.9831 0.0169site 1.41 1.19 0.7090 0.2910

    _Iivhx_2 1.39 1.18 0.7201 0.2799_Iivhx_3 1.65 1.28 0.6061 0.3939

    age_ndrugfp1 27.55 5.25 0.0363 0.9637racesite 1.64 1.28 0.6109 0.3891

    ----------------------------------------------------Mean VIF 20.82

    CondEigenval Index

    ---------------------------------1 5.9439 1.00002 1.2749 2.15923 1.0679 2.35924 1.0129 2.42245 0.7402 2.83386 0.4588 3.59957 0.3110 4.37168 0.1469 6.36209 0.0320 13.628910 0.0094 25.149011 0.0021 52.8408

    ---------------------------------Condition Number 52.8408Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)Det(correlation matrix) 0.0002

    - conditional index/variance decomposition proportion- CI 30, VDP .5

    . coldiag2 age ndrugfp1 ndrugfp2 race treat site _Iivhx_2 _Iivhx_3 age_ndrugfp1, force w(5)

    Condition number using scaled variables = 52.18

    Condition Indexes and Variance-Decomposition Proportions

    conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1

    > 1 1.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.002 2.22 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.12 0.003 2.38 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.37 0.03 0.004 2.70 0.00 0.00 0.00 0.00 0.64 0.01 0.14 0.00 0.03 0.005 3.23 0.00 0.00 0.00 0.00 0.20 0.05 0.69 0.14 0.00 0.006 3.56 0.00 0.00 0.00 0.00 0.03 0.80 0.03 0.12 0.06 0.007 6.12 0.02 0.02 0.00 0.00 0.12 0.08 0.06 0.32 0.70 0.008 13.34 0.07 0.06 0.01 0.02 0.00 0.02 0.00 0.03 0.02 0.219 24.83 0.10 0.43 0.03 0.35 0.00 0.00 0.00 0.00 0.03 0.3010 52.18 0.81 0.49 0.96 0.62 0.00 0.00 0.00 0.00 0.01 0.49

    . prnt_cx, force w(5)

    Condition Indexes and Variance-Decomposition Proportions

    conditionindex _cons age ndr~1 nd~p2 race treat site _Ii~2 _Ii~3 age~1

    > 1 1.00 . . . . . . . . . . 2 2.22 . . . . . . . . . . 3 2.38 . . . . . . . 0.37 . . 4 2.70 . . . . 0.64 . . . . . 5 3.23 . . . . . . 0.69 . . . 6 3.56 . . . . . 0.80 . . . . 7 6.12 . . . . . . . 0.32 0.70 . 8 13.34 . . . . . . . . . . 9 24.83 . 0.43 . 0.35 . . . . . 0.3010 52.18 0.81 0.49 0.96 0.62 . . . . . 0.49

    Variance-Decomposition Proportions less than .3 have been printed as "."

    1) interaction 2) 3) exponential 2

    f x covariate logit

    xfxfxfg ii 3210),(

    odds ratio interaction 1. logit 2

    2. log

    3. exponential 2

    xfxfxfg 1321101 ),( xfxfxfg 0320100 ),(

    )()()()(

    ),(),(),,(log

    013011

    032010132110

    0101

    ffxffxfxfxfxf

    xfgxfgxfFfFOR

    )()(exp 013011 ffxffor

  • 20

    odds ratio interaction independent = 2

    )()(exp 013011 ffxffor

    fxxfppyxfg 3211

    ln),(

    F=f =1= risk factor f=0 reference , x = covariate

    Confidence interval

    xXFForESZx ,0,1(ln 2/31

    ),(2)()(

    ,0,1(ln

    2132

    1 voxCraVxraV

    xXFFROraV

    . logit low lwd age lwd_ageIteration 0: log likelihood = -117.336Iteration 1: log likelihood = -110.71804Iteration 2: log likelihood = -110.57024Iteration 3: log likelihood = -110.56997Logistic regression Number of obs = 189

    LR chi2(3) = 13.53Prob > chi2 = 0.0036

    Log likelihood = -110.56997 Pseudo R2 = 0.0577------------------------------------------------------------------------------

    low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    lwd | -1.944089 1.724804 -1.13 0.260 -5.324643 1.436465age | -.0795722 .0396343 -2.01 0.045 -.157254 -.0018904

    lwd_age | .1321967 .0756982 1.75 0.081 -.0161691 .2805626_cons | .7744952 .9100949 0.85 0.395 -1.009258 2.558248

    ------------------------------------------------------------------------------. di exp((-1.944)+(.132*15)) /* 15 x lwd=1 */1.0366558

    . di exp((-1.944)+(.132*20)) /* 20 x lwd=1 */2.0057138

    lwd (1=lwt < 110 pounds, 0=otherwise) age = continuous (years)

    . estat vceCovariance matrix of coefficients of logit model

    e(V) | lwd age lwd_age _cons -------------+------------------------------------------------

    lwd | 2.974949 age | .03526621 .00157088

    lwd_age | -.12760349 -.00157088 .00573022 _cons | -.82827277 -.03526621 .03526621 .82827277

    . lincom lwd + 15*lwd_age , or( 1) lwd + 15 lwd_age = 0------------------------------------------------------------------------------

    low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    (1) | 1.039627 .6865828 0.06 0.953 .284927 3.79334------------------------------------------------------------------------------

    . lincom lwd + 20*lwd_age , or

    ( 1) lwd + 20 lwd_age = 0------------------------------------------------------------------------------

    low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

    (1) | 2.013443 .81264 1.73 0.083 .9128263 4.441098------------------------------------------------------------------------------