6953 p8 dadlogistic regression 2013
DESCRIPTION
adsasTRANSCRIPT
-
Prepared by:
Assoc. Prof. Dr Bahaman Abu SamahDepartment of Professional Development and Continuing Education
Faculty of Educational StudiesUniversiti Putra Malaysia
Serdang
-
Logistic regression is an alternative to multiple linear regression
Used to predict outcome variable that is a categorical
dichotomy from a set of categorical or continuous predictor
variables
Used because with the categorical dichotomy outcome variable
violates the assumption of linearity in normal regression
Logistic regression emphasizes the probability of a particular
outcome for each case
Stat TEMPLATE
5950 Data Analysis TEMPLATE ver 4.xlsx -
probability of having one
outcome or another based on the best linear combination of
predictors using maximum-likelihood estimation
Probability of Y is calculated based on the following formula:
u
u
e
eYYP
1)(
ppXbXbXbbu
e
p
.......22110
2.718)( logarithmsnaturalofbasethe
yProbabilitwhere
Formula 1
-
110
110
1)(
Xbb
Xbb
ie
eYYP
With one predictor variable, the formula will be:
With multiple predictor variables (p), the formula will be:
pp
pp
XbXbXbb
XbXbXbb
ie
eYYP
.......
......
22110
22110
1)(
ppXbXbXbbu
e
p
.......22110
2.718)( logarithmsnaturalofbasethe
yProbabilitwhere
-
The resulting value from the above computing (probability)
ranges between 0 and 1
:: A value close to 0 means Y is very unlikely to occur
:: A value close to 1 means Y is very likely to occur
Example 1 Pass
0 Fail
-
1. Predict outcome variable based on from a set of
categorical or continuous predictor variables. Logistic
regression calculate probability of success over
probability of failure, the result is presented as an
odds ratio or likelihood ratio
2. Determine relationships among constructs
-
DV Dichotomous, assigned as 1 and 0
IV Continuous/categorical
-
Can outcome be predicted from a set of predictor
variables?
Which predictor variables predict the outcome?
How strong is the relationship between outcome and
the predictor variables?
-
Assessing Model Fit
Assessing the Predictor
Relationship between
Predictors - Outcome
Odds Ratio
Classification of Cases
-
Use the observed and predicted value of the outcome to assess the
fit of the model.
The statistic used to measure the fit of the model is called log-
likelihood:
N
i
iiii YYYY1
)1(ln)1()(lnlikelihood-Log
The log-likelihood is the summation of probabilities associated with
the predicted and actual outcomes
This log-likelihood statistic is comparable to residual sum of squares
(SSE) in multiple regression
Formula 2
-
Log-likelihood will be calculated for two different models (bigger and
smaller)
The two models are compared by computing the difference in their
log-likelihood using Chi-square ( 2)
LL(B) is log-likelihood for the bigger model which includes all the
predictors
LL(0) is log-likelihood for the smaller model which includes only the
intercept
degrees of freedom (df) = kB k0 where k is number of parameters
)0()(22 LLBLL Formula 3
-
Test the null hypothesis that HO: i = 0
Test the individual contribution of predictor variables using
Wald statistic
The Wald statistic is comparable to t-test in multiple regression
Wald statistic is the squared ratio of the unstandardized logistic
coefficient to its standard error.
The Wald statistic and its corresponding p probability level is
part of SPSS output in the "Variables in the Equation" table.
2
)(bSE
bWald
-
A number of statistics can be used as measures of association
between predictors and outcome
The measures include:
1. R-Statistic
2. Cox and Snell R2
3. Nagelkerke R2
4. Hosmer and R2
-
R-statistic is comparable to multiple correlation coefficient
Formula:
)0(2
)*2(
LL
dfWaldR Formula 4
R-statistic ranges between -1 to +1
A positive value: as the predictor increases, likelihood of the
outcome occurring increases, vice versa
-
R2cs is comparable to R2 in multiple linear regression
The value is displayed in SPSS Logistic Regression
Formula:
))0()((2
2 1LLBLL
n
CS eRFormula 5
However the value of R2cs never reaches its theoretical
maximum of 1
-
Nagelkerke suggested for amendment to the earlier R2CS
The value is displayed in SPSS Logistic Regression
Formula:
n
LL
CSN
e
RR
))0((2
22
1
Formula 6
-
Formula to calculate R2L
)0(2
)(22
LL
BLLRL Formula 7
-
Odds ratio is an indicator of the change in odds (likelihood) resulting
from a unit change in the predictor
The odds ratio is the increase (or decrease if the ratio is less than 1) in
odds of being in one outcome category when the predictor increases
by one unit.
It is similar to b-coefficient but is easier to interpret (it does not involve
logarithmic transformation)
The odd of an event occurring are defined as the probability of an
event occurring divided by the probability of the event not occurring
)(
)(
eventnoP
eventPOdds Formula 8
-
The coefficients (b) are the natural logs of the odds ratio, thus
odds ratio can be calculated using the following formula:
Odds ratio indicates the change in odds resulting from a unit
change in the predictor
Odds ratio > 1
Odds ratio < 1
beratioodds Formula 9
-
X is income (in RM1,000) to predict home ownership (1 = Yes & 0
=No)
if b = 1.25
49.3
25.1eratioodd
1 unit increase in income (in RM1,000) will increase the odd
(likelihood) of home ownership by 3.49 times
-
One method of assessing the success of a model is to evaluate its
ability to predict correctly the outcome
The cut-off value for classification is .50
probability of greater than .5
SPSS provides:
1.
2.
3. Overall percentage
-
1. Enter
All variables entered simultaneously
2. Sequential/Hierarchical
Variables entered in blocks
Blocks should be based on past research or theory being
tested
3. Stepwise
Variables entered on the basis of statistical criteria (relative
contribution to predict outcome)
Should be employed only for exploratory analysis
-
(From Tabachnick)
The following data set
include three variables:
1. FALL
0 - Not falling
1 - Falling
2. DIFFICULTY
Rated on 1 to 3 scale
3. SEASON
1 - autumn
2 - winter
3 - spring
Data set:
Fall Difficulty Season1 3 11 1 10 1 31 2 31 3 20 2 20 1 21 3 11 2 31 2 10 2 20 2 31 3 21 2 20 3 1
Data: Logistic Regression Tabachnick SKI
-
)2)(418.0()1)(928.0())(010.1(776.1
)2)(418.0()1)(928.0())(010.1(776.1
1)(Prob
SEASSEASDIFF
SEASSEASDIFF
ie
eYFall
)0()(22 LLBLL Formula 3
N
i
iiii YYYY1
)1(ln)1()(lnlikelihood-Log Formula 2
Formula 1
Excel Computation
../Logistic Regression/6953 Logistic Regression Tabachnick.xlsx -
Excel Computation
../Logistic Regression/6953 Logistic Regression Tabachnick.xlsx -
Table 1: Logistic Regression Analysis of Falling on a Ski Run as a
Function of Difficulty of Run and Season
Variables B Wald Test p Odds ratio
Constant -1.776 0.88 .347 .169
Difficulty 1.010 1.27 .259 2.747
Season(1) .927 0.34 .560 2.527
Season(2) -.418 0.09 .763 .658
Note: R2 = .165 (Cox & Snell), .227 (Nagelkerke)
Model 2 (3)= 2.710, p = .439
May want to also report CI for Odds ratio
-
(Adapted from Andy Field)
Variable Label/Value
PERFORM Performance in Subject
0 No
1 Yes
INTEREST Interest in the Subject
0 No
1 Yes
AGE Age in years
Data: Logistic Regression PERFORM
-
Table 2: Logistic Regression Analysis of Performance
as a Function of Interest and Age
Constant
Interest
Age
Variables B Wald Test p Odds ratio
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___
-
(From Tabachnick)
Variable Label/Value
WorkStatus Work status1 Working2 Housewives
Children Presence of children0 No1 Yes
Control Locus of controlAttMar Attitudes toward current marriageAttHouse Attitudes toward houseworkAttRole Attitudes toward role of womenAge Age groupEduc Years of education
Data: Logistic Regression TabachnickWORK STATUS
-
Table 3: Logistic Regression Analysis of Work Status as a Function of
Attitudinal Variables
Constant
Locus of control
Attitude towards marital status
Attitude towards role of women
Attitude towards housework
Variables B Wald Test p Odds ratio
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___
-
Table 3: Logistic Regression Analysis of Work Status as a Function of
Attitudinal Variables and Children
Variables B Wald Test p Odds ratio
Note: R2 = ___ (Cox & Snell), ___ (Nagelkerke)
Model 2 (_)= _____, p = ___
Constant
Presence of children
Locus of control
Attitude towards marital status
Attitude towards role of women
Attitude towards housework