generalized linear model (gzlm): overview. dependent variables continuous discrete dichotomous ...

Post on 28-Dec-2015

302 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Generalized Linear Model (GZLM):

Overview

Dependent Variables

Continuous Discrete

DichotomousPolychotomousOrdinalCount

Continuous Variables

Quantitative variables that can take on any value within the limits of the variable

Continuous Variables (cont’d) Distance, time, or length

Infinite number of possible divisions between any two values, at least theoretically

“Only love can be divided endlessly and still not diminish” (Anne Morrow Lindbergh)

More than 11 ordered valuesScores on standardized scales such as those

that measure parenting attitudes, depression, family functioning, and children’s behavioral problems

Discrete Variables

Finite number of indivisible values; cannot take on all possible values within the limits of the variableDichotomousPolytomous OrdinalCount

Dichotomous Variables

Two categories used to indicate whether an event has occurred or some characteristic is present

Sometimes called binary or binomial variables

“To be or not to be, that is the question..” (William Shakespeare, “Hamlet”)

Dichotomous DVs

Placed in foster care or not Diagnosed with a disease or not Abused or not Pregnant or not Service provided or not

Polytomous Variables

Three or more unordered categories Categories mutually exclusive and

exhaustive Sometimes called multicategorical or

sometimes multinomial variables “Inanimate objects can be classified

scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker)

Polytomous DVs

Reason for leaving welfare:marriage, stable employment, move to

another state, incarceration, or death Status of foster home application:

licensed to foster, discontinued application process prior to licensure, or rejected for licensure

Changes in living arrangements of the elderly:newly co-residing with their children, no

longer co-residing, or residing in institutions

Ordinal Variables

Three or more ordered categories Sometimes called ordered categorical

variables or ordered polytomous variables

“Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)

Ordinal DVs

Job satisfaction:very dissatisfied, somewhat dissatisfied,

neutral, somewhat satisfied, or very satisfied Severity of child abuse injury:

none, mild, moderate, or severe Willingness to foster children with

emotional or behavioral problems: least acceptable, willing to discuss, or most

acceptable

Count Variables

Number of times a particular event occurs to each case, usually within a given:Time period (e.g., number of hospital visits

per year)Population size (e.g., number of registered

sex offenders per 100,000 population), orGeographical area (e.g., number of divorces

per county or state) Whole numbers that can range from 0

through +

Count Variables (cont’d)

“Now I've got heartaches by the number,Troubles by the score,Every day you love me less,Each day I love you more” (Ray Price)

Count DVs

Number of hospital visits, outpatient visits, services used, divorces, arrests, criminal offenses, symptoms, placements, children fostered, children adopted

General Linear Model (GLM) (selected models)

Continuous DV

Linear Regression

ANOVA

t-test

Generalized Linear Model (GZLM) (selected regression models)

GZLM

ContinuousDV

DichotomousDV

Polytomous DV

OrdinalDV

CountDV

LinearRegression

BinaryLogistic

Regression

MultinomialLogistic

Regression

OrdinalLogistic

Regression

Poisson orNegativeBinomial

Regression

Generalized How?

DV continuous or discrete Normal or non-normal error distributions Constant or non-constant variance Provides a unifying framework for

analyzing an entire class of regression models

GLM & GZLM Similarities

IVs are combined in a linear fashion (α + 1X1 + 2X2 + … kXk ;

a slope is estimated for each IV; each slope has an accompanying test

of statistical significance and confidence interval;

each slope indicates the IV’s independent contribution to the explanation or prediction of the DV;

GLM & GZLM Similarities (cont’d) the sign of each slope indicates the

direction of the relationship IVs can be any level of measurement; the same methods are used for coding

categorical IVs (e.g., dummy coding); IVs can be entered simultaneously,

sequentially or using other methods; product terms can be used to test

interactions;

GLM & GZLM Similarities (cont’d) powered terms (e.g., the square of an

IV) can be used to test curvilinearity; overall model fit can be tested, as can

incremental improvement in a model brought about by the addition or deletion of IVs (nested models); and

residuals, leverage values, Cook’s D, and other indices are used to diagnose model problems.

Common Assumptions

Correct model specification Variables measured without error Independent errors No perfect multicollinearity

Correct Model Specification

Have you included relevant IVs? Have you excluded irrelevant IVs? Do the IVs that you have included have

linear or non-linear relationships with your DV (or some function of your DV, as discussed below)?

Are one or more of your IVs moderated by other IVs (i.e., are there interaction effects)?

Variables Measured without Error Limitation of regression models, given

that most often our variables contain some measurement error

Independent Errors

Can be result of study design, e.g.:– Clustered data, which occurs when data are

collected from groups– Temporally linked data, which occurs when

data are collected repeatedly over time from the same people or groups

Can lead to incorrect significance tests and confidence intervals

Independent Errors (cont’d)

Examples of when this might not be trueEffect of parenting practices on behavioral

problems of children and reports of parenting practices and behavioral problems collected from both parents in two-parent families

Effect of parenting practices on behavioral problems of children and information collected about behavioral problems for two or more children per family

Effects of leader behaviors on group cohesion in small groups, and information collected about leader behaviors and group cohesion from all members of each group

No Perfect Multicollinearity

Perfect multicollinearity exists when an IV is predicted perfectly by a linear combination of the remaining IVs

Typically quantified by “tolerance” or “variance inflation factor” (VIF) (1/tolerance)

Even high levels of multicollinearity may pose problems (e.g., tolerance < .20 or especially < .10)

Estimating Parameters (e.g.,

) GLM

Ordinary Least Squares (OLS) estimation• Estimates minimize sum of the squared

differences between observed and estimated values of the DV

http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html

GZLMMaximum Likelihood (ML) estimation

• Estimates have greatest likelihood (i.e., the maximum likelihood) of generating observed sample data if model assumptions are true

Testing Hypotheses

Overall and nested models (1 = 2 = k = 0)GLM

• F GZLM

• Likelihood ratio 2

Individual slopes ( = 0)GLM

• tGZLM

• Wald 2 or likelihood ratio 2

Estimating DV with GLM

Three ways of expressing the same thing… = α + 1X1 + 2X2 + … kXk

= • Assumed linear relationship

= Greek letter muEstimated mean value of DV

= Greek letter etaLinear predictor

Estimating DV with Poisson Regresion

ln() = α + 1X1 + 2X2 + … kXk

ln() = Assumed linear relationship

Single (Quantitative) IV Example

DV = number of foster children adopted IV = Perceived responsibility for

parenting (scale scores transformed to z-scores)

N = 285 foster mothers

Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children?

Poisson Model

ln() = α + X

log of estimated mean count .018 + (.185)(X)Log of mean number of children adoptedDoes not have intuitive or substantive

meaning

Mathematical Functions

Function√4 = 2

Inverse (reverse) function22 = 4

Mathematical Functions (cont’d)

Function ln(), natural logarithm of “Link function”

Inverse (reverse) functionexp(), exponential of

• ex on calculator• exp(x) in SPSS and Excel

“Inverse link function”

Link Function

ln(), log of estimated mean countConnects (i.e., links) mean value of DV to

linear combination of IVsTransforms relationship between and so

relationship is linearDifferent GZLM models use different linksDoes not have intuitive or substantive

meaning

Inverse (Reverse) Link Function

Three ways of expressing the same thing… = exp(α + 1X1 + 2X2 + … kXk) = exp() = e

represent values of the DV with intuitive and substantive meaninge.g., mean number of children adopted

Estimated Mean DV

.018 + (.185)(X)

X = 0 .018 + (.185)(0) = .018e.018 = 1.018M = 1.02 children adopted

X = 1 .018 + (.185)(1) = .203e.203 = 1.225M = 1.23 children adopted

Examples of Exponentiation

e0 = 1.00

e.50 = 1.65

e1.00 = 2.72

Problem

For discrete DVs the relationship between the DV () and the linear predictor () is non-linear

= α + 1X1 + 2X2 + … kXk =

• Non-linear

One-unit increase in an IV may be associated with a different amount of change in the mean DV, depending on the initial value of the IV

Example Non-linear Relationship

0.00

0.50

1.00

1.50

2.00

Standardized Parenting Responsibility

Mea

n N

umbe

r of

Chi

ldre

n

Mean Number ofChildren

0.58 0.70 0.85 1.02 1.23 1.47 1.77

-3 -2 -1 0 1 2 3

Solution

Linear relationship between a linear combination of one or more IVs and some function of the DV

Example Linear Relationship

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

Standardized Parenting Responsibility

ln(M

ean

Num

ber

of C

hild

ren)

ln(Mean Number ofChildren)

-0.54 -0.35 -0.17 0.02 0.20 0.39 0.57

-3 -2 -1 0 1 2 3

top related