class 6 qualitative dependent variable models skema ph.d programme 2010-2011 lionel nesta...

124
Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques [email protected]

Upload: peregrine-hopkins

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Class 6

Qualitative Dependent Variable Models

SKEMA Ph.D programme2010-2011

Lionel Nesta

Observatoire Français des Conjonctures Economiques

[email protected]

Page 2: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Structure of the class

1. The linear probability model

2. Maximum likelihood estimations

3. Binary logit models and some other models

4. Multinomial models

5. Ordered multinomial models

6. Count data models

Page 3: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The Linear Probability Model

Page 4: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The linear probability model

When the dependent variable is binary (0/1, for example, Y=1 if the firm innovates, 0 otherwise), OLS is called the linear probability model.

0 1 1 2 2Y x x u

How should one interpret βj? Provided that OLS4 – E(u|X)=0 – holds true, then:

0 1 1 2 2E(Y | X) x x

Page 5: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Y follows a Bernoulli distribution with expected value P. This model is called the linear probability model because its expected value, conditional on X, and written E(Y|X), can be interpreted as the conditional probability of the occurrence of Y given values of X.

E(Y | X) Pr(Y 1| X)

1 E(Y | X) Pr(Y 0 | X)

β measures the variation of the probability of success for a one-unit variation of X (ΔX=1)

E(Y | X) Pr(Y 1| X)Pr(Y 1| X)

X X

The linear probability model

Page 6: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Non normality of errors

OLS6 : The error term is independent of all RHS and follows a normal distribution with zero mean and variance σ²

Since the errors are the complement to unity of the conditional probability, they follow a Bernoulli distribution, not a normal distribution.

2u Normal(0, )

Limits of the linear probability model (1)

Page 7: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Non normality of errors

0.5

11

.52

2.5

De

nsi

ty

-1 -.5 0 .5Residuals

Limits of the linear probability model (1)

Page 8: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Heteroskedastic errors

OLS5 : The variance of the error term, u, conditional on RHS, is the same for all values of RHS

The error term is itself distributed Bernoulli, and its variance depends on X. Hence it is heteroskedastic

21 2 kVar u x ,x , ,x

Var(u) P(1 P) E(Y | X) (1 E(Y | X))

Limits of the linear probability model (2)

Page 9: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Heteroskedastic errors

-1-.

50

.5R

esi

du

als

.4 .6 .8 1 1.2Fitted values

Limits of the linear probability model (2)

Page 10: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Fallacious predictions

By definition, a probability is always in the unit interval [0;1]

But OLS does not guarantee this condition Predictions may lie outside the bound [0;1] The marginal effect is constant , since P = E(Y|X) grows linearly with X. This is not very realistic (ex: the probability to give birth conditional on the number of

children already born)

0 E Y | X 1

Limits of the linear probability model (3)

Page 11: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Fallacious predictions

01

23

De

nsi

ty

.4 .6 .8 1 1.2Fitted values

Fallacious predictions

Limits of the linear probability model (3)

Page 12: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

A downward bias in the coefficient of determination R²

Observed values are 1 or 0, whereas predictions should lie between 0 and 1: [0;1].

Comparing predicted with observed variables, the goodness of fit as assessed by the R² is systematically low .

Limits of the linear probability model (4)

Page 13: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Fallacious predictions0

.2.4

.6.8

1D

um

my

inn

ova

tion

.4 .6 .8 1 1.2Fitted values

Fallacious predictions which lower the R2

Limits of the linear probability model (4)

Page 14: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

1. Non normality of errors

2. Heteroskedastic errors

3. Fallacious predictions

4. A downward bias in the R² 0 E Y | X 1

21 2 kVar u x ,x , ,x

2u Normal(0, )

Limits of the linear probability model (4)

Page 15: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Overcoming the limits of the LPM1. Non normality of errors Increase sample size

2. Heteroskedastic errors Use robust estimators

3. Fallacious prediction Perform non linear or constrained regressions

4. A downward bias in the R² Do not use it as a measure of goodness of fit

Page 16: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Persistent use of LPM

Although it has limits, the LPM is still used

1. In the process of data exploration (early stages of the research)

2. It is a good indicator of the marginal effect of the representative observation (at the mean)

3. When dealing with very large samples, least squares can overcome the complications imposed by maximum likelihood techniques.

Time of computation Endogeneity and panel data problems

Page 17: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The LOGIT Model

Page 18: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Probability, odds and logit We need to explain the occurrence of an event: the LHS

variable takes two values : y={0;1}.

In fact, we need to explain the probability of occurrence of the event, conditional on X: P(Y=y | X) [0 ; 1]∈ .

OLS estimations are not adequate, because predictions can lie outside the interval [0 ; 1].

We need to transform a real number, say z to ]-∞;∈+∞[ into P(Y=y | X) [0 ; 1]∈ .

The logistic transformation links a real number z ]-∞;+∞[∈ to P(Y=y | X) [0 ; 1]∈ .It is also called the link function

Page 19: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The logit link function

Z

z

Z

ZZ Z

Z

z

e logit link

;

e

func

0;

e0;1 since e 1 e

1 e

is called the tion1 e

Let us make sure that the transformation of z lies between 0 and1

Page 20: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The logit model

Z

Z

Z Z

eP y 1| z

1 e1 1

P y 0 | z 1 P y 1| z 11 e 1 e

Hence the probability of any event to occur is :

But what is z?

Page 21: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

ZZ

Z

P 1 ee

1 P 1 eP

ln1 P

z

The odds ratio is defined as the ratio of the probability and its complement. Taking the log yields z. Hence z is the log transform of the odds ratio.

This has two important characteristics :

1. Z ]-∞;+∞[ ∈ and P(Y=1) [0 ; 1]∈2. The probability is not linear in z (The plot linking z with

straight line)

The odds ratio

Page 22: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Probability, odds and logitP(Y=1) Odds

p(y=1)

1-p(y=1)Ln (odds)

0.01 1/99 0,01 -4,60

0.03 3/97 0,03 -3,48

0.05 5/95 0,05 -2,94

0.20 20/80 0,25 -1,39

0.30 30/70 0,43 -0,85

0.40 40/60 0,67 -0,41

0.50 50/50 1,00 0,00

0.60 60/40 1,50 0,41

0.70 70/30 2,33 0,85

0.80 80/20 4,00 1,39

0.95 95/5 19,0 2,94

0.97 97/3 32,3 3,48

0.99 99/1 99,0 4,60

Page 23: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The logit transformation

The preceding table matches levels of probability with the odds ratio.

The probability varies between 0 and 1, The odds varies between 0 and + ∞. The log of the odds varies between – ∞ and + ∞ .

Notice that the distribution of the log of the odds is symetrical.

Page 24: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Logistic probability density distribution

0.0

5.1

.15

.2.2

5D

en

sity

-10 -5 0 5 10Log (Odds ratio)

Page 25: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

“The probability is not linear in z”

0.2

.4.6

.81

P(y

=1

| z)

-4 -2 0 2 4z

Page 26: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The logit link function

1 1 k k

Z

Z

z x x

e eHence is rewritten

1 e 1 e

z = Xβ

The whole trick that can overcome the OLS problem is then to posit:

But how can we estimate the above equation knowing that we do not observe z?

Page 27: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Maximum likelihood estimations OLS can be of much help. We will use Maximum Likelihood

Estimation (MLE) instead.

MLE is an alternative to OLS. It consists of finding the parameters values which is the most consistent with the data we have.

In Statistics, the likelihood is defined as the joint probability to observe a given sample, given the parameters involved in the generating function.

One way to distinguish between OLS and MLE is as follows:

OLS adapts the model to the data you have : you only have one model derived from your data. MLE instead supposes there is an infinity of

models, and chooses the model most likely to explain your data.

Page 28: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Let us assume that you have a sample of n random observations. Let f(yi ) be the probability that yi = 1 or yi = 0. The joint probability to observe jointly n values of yi is given by the likelihood function:

1 21

, ,..., ( )n

n ii

f y y y f y

We need to specify function f(.). It comes from the empirical descrite distribution of an event that can have only two outcome : a success (yi = 1) or a failure (yi = 0). This is the binomial distribution. Hence:

i i iiy 1 yk n ki

y 1 yi

i

1nf (y ) p (1 p) p (1 p) f (y ) p ( 1

yp)

k

Likelihood functions

Page 29: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Likelihood functions

Knowing p (as the logit), having defined f(.), we come up with the likelihood function:

i i

i i

i i

n ny 1 y

ii 1 i 1

y 1 yzn n

i z zi 1 i 1

y 1 yn n

ii 1 i 1

L y f (y ) p 1 p

e 1

,

L y, z f (y , z)1 e 1 e

e 1L y, x, f (y , )

1 e 1 e

Xβ XβX β

Page 30: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The log transform of the likelihood function (the log likelihood) is much easier to manipulate, and is written:

n nz

ii 1 i 1

n n

ii 1 i 1

n

ii 1

LL y,z y z ln 1 e

LL y, x, y ln 1 e

LL y, x, ln 1 e y

Log likelihood (LL) functions

Page 31: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The LL function can yield an infinity of values for the parameters β.

Given the functional form of f(.) and the n observations at hand, which values of parameters β maximize the likelihood of my sample?

In other words, what are the most likely values of my unknown parameters β given the sample I have?

Maximum likelihood estimations

Page 32: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

n

i i i zi 1

i zn

i i i ii 1

LLy x 0

ewhere

1 e²LL1 x x

However, there is not analytical solutions to this non linear problem. Instead, we rely on a optimization algorithm (Newton-Raphson)

The LL is globally concave and has a maximum. The gradient is used to compute the parameters of interest, and the hessian is used to compute the variance-covariance matrix.

Maximum likelihood estimations

You need to imagine that the computer is going to generate all possible values of β, and is going to compute a likelihood value for each (vector of ) values to then choose (the vector of) β such that the likelihood is

highest.

Page 33: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Example: Binary Dependent Variable

We want to explore the factors affecting the probability of being successful innovator (inno = 1): Why?

352 (81.7%) innovate and 79 (18.3%) do not.

The odds of carrying out a successful innovation is about 4 against 1 (as 352/79=4.45).

The log of the odds is 1.494 (z = 1.494)

For the sample (and the population?) of firms the probability of being innovative is four times higher than the probability of NOT being innovative

Page 34: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Instruction Stata : logit

logit y x1 x2 x3 … xk [if] [weight] [, options]

Options

noconstant : estimates the model without the constant

robust : estimates robust variances, also in case of heteroscedasticity

if : it allows to select the observations we want to include in the analysis

weight : it allows to weight different observations

Logistic Regression with STATA

Page 35: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Let’s start and run a constant only model logit inno

_cons 1.494183 .1244955 12.00 0.000 1.250177 1.73819 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -205.30803 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 431

Iteration 0: log likelihood = -205.30803

. logit inno

.

Goodness of fit

Parameter estimates, Standard errors and z values

Logistic Regression with STATA

Page 36: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

What does this simple model tell us ?

Remember that we need to use the logit formula to transform the logit into a probability :

e

P(Y 1| )1 e

XβX

Interpretation of Coefficients

Page 37: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The constant 1.491 must be interpreted as the log of the odds ratio.

Using the logit link function, the average probability to innovate is

dis exp(_b[_cons])/(1+exp(_b[_cons]))

We find exactly the empirical sample value: 81,7%

1,494

1,494

eP 0,817

1 e

Interpretation of Coefficients

Page 38: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

A positive coefficient indicates that the probability of innovation success increases with the corresponding explanatory variable.

A negative coefficient implies that the probability to innovate decreases with the corresponding explanatory variable.

Warning! One of the problems encountered in interpreting probabilities is their non-linearity: the probabilities do not vary in the same way according to the level of regressors

This is the reason why it is normal in practice to calculate the probability of (the event occurring) at the average point of the sample

Interpretation of Coefficients

Page 39: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of Coefficients

_cons -11.63447 1.937191 -6.01 0.000 -15.43129 -7.837643 biotech 3.799953 .577509 6.58 0.000 2.668056 4.93185 spe .4252844 .4204924 1.01 0.312 -.3988654 1.249434 lassets .997085 .1368534 7.29 0.000 .7288574 1.265313 lrdi .7527497 .2110683 3.57 0.000 .3390634 1.166436 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -163.45352 Pseudo R2 = 0.2039 Prob > chi2 = 0.0000 LR chi2(4) = 83.71Logistic regression Number of obs = 431

Iteration 4: log likelihood = -163.45352Iteration 3: log likelihood = -163.45376Iteration 2: log likelihood = -163.57746Iteration 1: log likelihood = -167.71312Iteration 0: log likelihood = -205.30803

. logit inno lrdi lassets spe biotech

Let’s run the more complete model logit inno lrdi lassets spe biotech

Page 40: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

-11.63 0.75 0.99 0.43 3.79

-11.63 0.75 0.99 0.43 3.79

eP

1 e

rdi lassets spe biotech

rdi lassets spe bi

otech

Using the sample mean values of rdi, lassets, spe and biotech, we compute the conditional probability :

-11.63 0.75 0.99 0.43 3.79

-11.63 0

rdi lassets spe biotech

rdi lassets s.75 0.99 0.43 3.79pe biotech

eP

1 e

e0,8758

1 e

1.953

1.953

Interpretation of Coefficients

Page 41: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

It is often useful to know the marginal effect of a regressor on the probability that the event occur (innovation)

As the probability is a non-linear function of explanatory variables, the change in probability due to a change in one of the explanatory variables is not identical if the other variables are at the average, median or first quartile, etc. level.

prvalue provides the predicted probabilities of a logit model (or any other) prvalue prvalue , x(lassets=10) rest(mean) prvalue , x(lassets=11) rest(mean) prvalue , x(lassets=12) rest(mean) prvalue , x(lassets=10) rest(median) prvalue , x(lassets=11) rest(median) prvalue , x(lassets=12) rest(median)

Marginal Effects

Page 42: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

prchange provides the marginal effect of each of the explanatory variables for the majority of the variations of the desired values

prchange [varlist] [if] [in range] ,x(variables_and_values) rest(stat) fromto

prchange

prchange, fromto

prchange , fromto x(size=10.5) rest(mean)

Marginal Effects

Page 43: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Goodness of Fit Measures

In ML estimations, there is no such measure as the R2

But the log likelihood measure can be used to assess the goodness of fit. But note the following : The higher the number of observations, the lower the joint probability, the

more the LL measures goes towards -∞ Given the number of observations, the better the fit, the higher the LL

measures (since it is always negative, the closer to zero it is)

The philosophy is to compare two models looking at their LL values. One is meant to be the constrained model, the other one is the unconstrained model.

Page 44: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Goodness of Fit Measures

A model is said to be constrained when the observed set the parameters associated with some variable to zero.

A model is said to be unconstrained when the observer release this assumption and allows the parameters associated with some variable to be different from zero.

For example, we can compare two models, one with no explanatory variables, one with all our explanatory variables. The one with no explanatory variables implicitly assume that all parameters are equal to zero. Hence it is the constrained model because we (implicitly) constrain the parameters to be nil.

Page 45: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The likelihood ratio test (LR test) The most used measure of goodness of fit in ML estimations is the

likelihood ratio. The likelihood ratio is the difference between the unconstrained model and the constrained model. This difference is distributed 2.

If the difference in the LL values is (no) important, it is because the set of explanatory variables brings in (un)significant information. The null hypothesis H0 is that the model brings no significant information as

follows:

High LR values will lead the observer to reject hypothesis H0 and accept

the alternative hypothesis Ha that the set of explanatory variables does

significantly explain the outcome.

unc cLR 2 ln L ln L

Page 46: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The McFadden Pseudo R2

We also use the McFadden Pseudo R2 (1973). Its interpretation is analogous to the OLS R2. However its is biased doward and remain generally low.

Le pseudo-R2 also compares The likelihood ratio is the difference between the unconstrained model and the constrained model and is comprised between 0 and 1.

c unc2 uncMF

unc c

ln L ln L ln LPseudo R 1

ln L ln L

Page 47: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Goodness of Fit Measures

_cons -11.63447 1.937191 -6.01 0.000 -15.43129 -7.837643 biotech 3.799953 .577509 6.58 0.000 2.668056 4.93185 spe .4252844 .4204924 1.01 0.312 -.3988654 1.249434 lassets .997085 .1368534 7.29 0.000 .7288574 1.265313 lrdi .7527497 .2110683 3.57 0.000 .3390634 1.166436 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -163.45352 Pseudo R2 = 0.2039 Prob > chi2 = 0.0000 LR chi2(4) = 83.71Logistic regression Number of obs = 431

. logit inno lrdi lassets spe biotech, nolog

_cons 1.494183 .1244955 12.00 0.000 1.250177 1.73819 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -205.30803 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 431

Iteration 0: log likelihood = -205.30803

. logit inno

Constrained model

Unconstrained model

unc cLR 2 ln L ln L

2 163.5 205.3

83.8

2MF unc cPs.R 1 ln L ln L

1 163.5 205.3

0.204

Page 48: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Other usage of the LR test

The LR test can also be generalized to compare any two models, the unconstrained one being nested in the constrained one.

Any variable which is added to a model can be tested for its explanatory power as follows : logit [modèle contraint]

est store [nom1]

logit [modèle non contraint]

est store [nom2]

lrtest nom2 nom1

Page 49: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Goodness of Fit Measures

LR test on the added variable (biotech)

unc cLR 2 ln L ln L

2 163.5 191.8

56.8

(Assumption: model1 nested in model2) Prob > chi2 = 0.0000Likelihood-ratio test LR chi2(1) = 56.78

. lrtest model2 model1

. est store model2

_cons -11.63447 1.937191 -6.01 0.000 -15.43129 -7.837643 biotech 3.799953 .577509 6.58 0.000 2.668056 4.93185 spe .4252844 .4204924 1.01 0.312 -.3988654 1.249434 lassets .997085 .1368534 7.29 0.000 .7288574 1.265313 lrdi .7527497 .2110683 3.57 0.000 .3390634 1.166436 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -163.45352 Pseudo R2 = 0.2039 Prob > chi2 = 0.0000 LR chi2(4) = 83.71Logistic regression Number of obs = 431

. logit inno lrdi lassets spe biotech, nolog

. est store model1

_cons -.4703812 .9313494 -0.51 0.614 -2.295793 1.35503 spe .3739987 .3800765 0.98 0.325 -.3709376 1.118935 lassets .3032756 .0792032 3.83 0.000 .1480402 .4585111 lrdi .9275668 .1979951 4.68 0.000 .5395037 1.31563 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -191.84522 Pseudo R2 = 0.0656 Prob > chi2 = 0.0000 LR chi2(3) = 26.93Logistic regression Number of obs = 431

. logit inno lrdi lassets spe, nolog

Page 50: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Quality of predictions

Lastly, one can compare the quality of the prediction with the observed outcome variable (dummy variable).

One must assume that when the probability is higher than 0.5, then the prediction is that the vent will occur (most likely

And then one can compare how good the prediction is as compared with the actual outcome variable.

STATA does this for us:

estat class

Page 51: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Quality of predictions

Correctly classified 84.69% False - rate for classified - Pr( D| -) 34.88%False + rate for classified + Pr(~D| +) 13.14%False - rate for true D Pr( -| D) 4.26%False + rate for true ~D Pr( +|~D) 64.56% Negative predictive value Pr(~D| -) 65.12%Positive predictive value Pr( D| +) 86.86%Specificity Pr( -|~D) 35.44%Sensitivity Pr( +| D) 95.74% True D defined as inno != 0Classified + if predicted Pr(D) >= .5

Total 352 79 431 - 15 28 43 + 337 51 388 Classified D ~D Total True

Logistic model for inno

. estat class

Page 52: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The Logit model is only one way of modeling binary choice models

The Probit model is another way of modeling binary choice models. It is actually more used than logit models and assume a normal distribution (not a logistic one) for the z values.

The complementary log-log models is used where the occurrence of the event is very rare, with the distribution of z being asymetric.

Other Binary Choice models

Page 53: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Other Binary Choice models

Probit model

Complementary log-log model

22 2z 2z e e

Pr(Y 1| X) dz dz t dz2 2

X β

X β X βXβ

Pr(Y 1| X) 1 exp exp( ) X β X β

Page 54: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Likelihood functions and Stata commands

1

1 1

1

1 1

1

1( , , ) ( , , )

1 1

( , , ) ( , , ) ( ) 1 ( )

( , , ) ( , , ) 1 exp( exp( )) exp( exp(

i i

i i

i

y yn n

i ii i

n ny y

i ii i

ny

i ii

eL y x f y x

e e

L y x f y x

L y x f y x

X β

X β X β

X β X β

X β

Logit :

Probit :

Log-log comp : 1

1

)) in

y

i

X β

Example logit inno rdi lassets spe pharmaprobit inno rdi lassets spe pharmacloglog inno rdi lassets spe pharma

Page 55: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Probability Density Functions

0.1

.2.3

.4y

-4 -2 0 2 4x

Probit Transformation Logit TransformationComplementary log log Transformation

Page 56: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Cumulative Distribution Functions

0.2

.4.6

.81

y

-4 -2 0 2 4x

Probit Transformation Logit TransformationComplementary log log Transformation

Page 57: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Comparison of modelsOLS Logit Probit C log-log

Ln(R&D intensity) 0.110 0.752 0.422 354

[3.90]*** [3.57]*** [3.46]*** [3.13]***

ln(Assets) 0.125 0.997 0.564 0.493

[8.58]*** [7.29]*** [7.53]*** [7.19]***

Spe 0.056 0.425 0.224 0.151

[1.11] [1.01] [0.98] [0.76]

BiotechDummy 0.442 3.799 2.120 1.817

[7.49]*** [6.58]*** [6.77]*** [6.51]***

Constant -0.843 -11.634 -6.576 -6.086

[3.91]** [6.01]*** [6.12]*** [6.08]***

Observations 431 431 431 431

Absolute t value in brackets (OLS) z value for other models.

* 10%, ** 5%, *** 1%

Page 58: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Comparison of marginal effects

OLS Logit Probit C log-log

Ln(R&D intensity) 0.110 0.082 0.090 0.098

ln(Assets) 0.125 0.110 0.121 0.136

Specialisation 0.056 0.046 0.047 0.042

Biotech Dummy 0.442 0.368 0.374 0.379

For all models logit, probit and cloglog, marginal effects have been computed for a one-unit variation (around the mean) of the variable at stake, holding all other variables at the sample mean values.

Page 59: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Multinomial LOGIT Models

Page 60: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Multinomial modelsLet us now focus on the case where the dependent variable has

several outcomes (or is multinomial). For example, innovative firms

may need to collaborate with other organizations. One can code this

type of interactions as follows Collaborate with university (modality 1) Collaborate with large incumbent firms (modality 2) Collaborate with SMEs (modality 3) Do it alone (modality 4)

Or, studying firm survival Survival (modality 1) Liquidation (modality 2) Mergers & acquisition (modality 3)

Page 61: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr
Page 62: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

One could first perform three logistic regressions as follows :

(1) (1) (1)0 1 1 m m

(2) (2) (2)0 1 1 m m

(3) (3) (3)0 1 1 m m

P(Y 1| X)ln x x

1 P(Y 1| X)

P(Y 2 | X)ln x x

1 P(Y 2 | X)

P(Y 3 | X)ln x x

1 P(Y 3 | X)

Where 1 = survival, 2 = liquidation, 3 = M&A.1. Open the file mlogit.dta2. Estimate for each type of outcome the conditional probability

of the event for the representative firm - time (log_time) - size (log labour)- firm age (entry_age)- Spin out (spin_out)- Cohort (cohort_*)

Multinomial models

Page 63: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

(1) (1) (1)0 1 1 m m

(2) (2) (2)0 1 1 m m

(3) (3) (3)0 1 1 m m

P(Y 1| X)ln x x

1 P(Y 1| X)

P(Y 2 | X)ln x x

1 P(Y 2 | X)

P(Y 3 | X)ln x x

1 P(Y 3 X)

|

P(Y 1| X) 0.8771

P(Y 2 | X) 0.0398

P(Y 3 | X) 0.0679

k

P(Y k | X) 0.9848 1

The need for multinomial models

Page 64: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

First, the sum of all conditional probabilities should add up to unity

k

j k

P Y 0 | X 1 P Y j | X

k

j 0

P Y j | X 1

Second, for k outcomes, we need to estimate (k-1) modality. Hence

Multinomial models

Page 65: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Third, the multinomial model is a simultaneous (as opposed to

sequential) estimation model comparing the odds of each modality

with respect to all others. With three outcomes, we have:

(1|0) (1|0) (1|0)0 1 1 m m

(2|0) (2|0) (2|0)0 1 1 m m

(1|2) (1|2) (1|2)0 1 1 m m

P(Y 1| X)ln x x

P(Y 0 | X)

P(Y 2 | X)ln x x

P(Y 0 | X)

P(Y 1| X)ln x x

P(Y 2 | X)

Multinomial logit models

Page 66: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

P Y 1| X P Y 2 | X P Y 1| Xln ln ln

P Y 0 | X P Y 0 | X P Y 2 | X

Note that there is redundancy, since :

1|0 2|0 1|2P Y 1| X P Y 2 | X P Y 1| Xln x ;ln x ;ln x

P Y 0 | X P Y 0 | X P Y 2 | X

1|0 2|0 1|2x x x

1|0 2|0 1|2

Fourth, the multinomial logit model estimates (k – 1) outcomes with following constrained:

Multinomial logit models

Page 67: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

( j|0 )

( j|0 )

x

j kx

j 0

eP Y j | X

e

With k outcomes, the probability of occurrence of event j reads:

By convention, outcome 0 is the base outcome

Multinomial logit models

Page 68: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

j| j P Y j | Xx ln ln(1) 0

P Y j | X

Note that j| jx, j : 0

( j|0 )j kx

j 1

1P Y 0 | X

1 e

( j|0 )

( j|0 )

x

j kx

j 1

eP Y j | X

1 e

( j|0 )

( j|0 )

x

j kx

j 0

eP Y j | X

e

Multinomial logit models

Page 69: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Binomial logit as multinomial logitLet us rewrite the probability of event that Y=1

The binomial logit binomial is a special case of the multinomial where only two outcomes are being analyzed.

(1|0) (1|0) (1|0)

(1|0) (0|0) (1|0) ( k|0)

x

x

x x x

x x x x

k 0,1

eP Y 1| X

1 e

e e eP Y 1| X

1 e e e e

Page 70: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Let us assume that you have a sample of n random observations. Let f(yj ) be the probability that yi = j. The joint probability to observe jointly n values of yj is given by the likelihood function:

1 21

, ,..., ( )n

n ii

f y y y f y

We need to specify function f(.). It comes from the empirical discrete distribution of an event that can have several outcomes. This is the multinomial distribution. Hence:

j0 1 k ki i i i idYdY dY dY dY

j 0 1 j k jj K

f (y ) p p p p p

Likelihood functions

Page 71: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The maximum likelihood function The maximum likelihood function reads:

ji

j0i i

( j|0)

( j|0) ( j|0)

n n kdY

i ji 1 i 1 j 1

dY dY

xn n k( j|0)

i i j k j kx xi 1 i 1 j 1

j 1 j 1

L(y) f y p

1 eL(y) f y , x ,

1 e 1 e

Page 72: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The maximum likelihood functionThe log transform of the likelihood yields

( j|0)i

( j|0) ( j|0)i i

( j|0)i i

xn k( j|0) 0 j

i ij k j kx xi 1 j 1

j 0 j 0

j kx x( j|0) j ( j|0)

i ij 0

1 eLL(y, x, ) dy ln dy ln

1 e 1 e

LL(y, x, ) ln 1 e dy x ln 1 e

( j|0)

( j|0)i

j kn k

i 1 j 1 j 0

j kn k kx( j|0) j ( j|0)

i ii 1 j 1 j 1 j 0

LL(y, x, ) dy x k 1 ln 1 e

Page 73: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Multinomial logit models

Stata Instruction : mlogit

mlogit y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : omits the constant

robust : controls for heteroskedasticity

if : select observations

weight : weights observations

Page 74: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

use mlogit.dta, clear mlogit type_exit log_time log_labour entry_age entry_spin cohort_*

Base outcome, chosen by STATA, with the highest empirical frequency

Goodness of fit

Parameter estimates, Standard errors and z values

Multinomial logit models

Page 75: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficientsThe interpretation of coefficients always refer to the base category

Does the probability of being bought-out decrease overtime ?

No!Relative to survival the probability of being bought-out decrease overtime

Page 76: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficientsThe interpretation of coefficients always refer to the base category

Is the probability of being bought-out lower for spinoff?

No!Relative to survival the probability of being bought-out is lower for spinoff

Page 77: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

Relative to liquidation, the probability of being bought-out is higher for

spinoff

1|0 2|0 1|2 2|0 1|0 2|1

lincom [boughtout]entry_spin – [death]entry_spin

Page 78: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Changing base outcomemcross provides other estimates by changing the base ouctome

Mind the new base outcome!!

Being bought-out relative to liquidation

Relative to liquidation, the probability of being bought-out is

higher for spinoff

Page 79: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

And we observe the same results as before

mcross provides other estimates by changing the base ouctome

Changing base outcome

Page 80: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Independence of irrelevant alternatives - IAA The model assumes that each pair of outcome is independent from all

other alternatives. In other words, alternatives are irrelevant.

From a statistical viewpoint, this is tantamount to assuming independence of the error terms across pairs of alternatives

A simple way to test the IIA property is to estimate the model taking off one modality (called the restrained model), and to compare the parameters with those of the complete model

If IIA holds, the parameters should not change significantly

If IIA does not hold, the parameters should change significantly

Page 81: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

H0: The IIA property is valid

H1: The IIA property is not valid

1* * *

R C R C R Cˆ ˆ ˆ ˆ ˆ ˆH var var

The H statistics (H stands for Hausman) follows a χ² distribution with M degree of freedom (M being the number of parameters)

Independence of irrelevant alternatives - IAA

Page 82: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

STATA application: the IIA test

H0: The IIA property is valid

H1: The IIA property is not valid

mlogtest, hausman

Omitted variable

Page 83: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Application de IIA

mlogtest, hausmanWe compare the parameters of the model

“liquidation relative bought-out”estimated simultaneously with “survival relative to bought-out”

avec

the parameters of the model

“liquidation relative bought-out”estimated without

“survival relative to bought-out”

H0: The IIA property is valid

H1: The IIA property is not valid

Page 84: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Application de IIA

mlogtest, hausman

The conclusion is that outcome survival significantly alters the choice between

liquidation and bought-out.

In fact for a company, being bought-out must be seen as a way to remain active with a cost of losing control on economic decision, notably

investment.

H0: The IIA property is valid

H1: The IIA property is not valid

Page 85: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered Multinomial LOGIT Models

Page 86: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial models

Let us now concentrate on the case where the dependent variable is

a discrete integer which indicates an intensity. Opinion surveys make

an extensive use on such so-called Likert Scale:

Obstacles à l’innovation (échelle de 1 à 5) Intensité de collaboration (échelle de 1 à 5) Enquête de marketing (N’apprécie pas (1) – Apprécie (7)) Note d’étudiants Test d’opinion Etc.

Page 87: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial models

*n 1

*n1 2*n2 3

*3 k

y 1 si y

y 2 si y

y 3 si y

y k si y

M

Such variables depict a vertical scale – quantitative, so that one can

think of them as describing the interval in which an unobserved

latent variable y* lies:

where αj are unknown bounds to be estimated.

Page 88: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial models

i i*i x uy

We assume that the latent variable y* is a linear combination of the

set of all explanatory variables

where ui follows a cumulative distribution function F(.). The

probabilities with each occurrence y (y ≠ y*) are then following the cdf

F(.). Let us look at the probability that y = 1 :

1 i

1 i

i i

i i

x

1 i x

*1i

1

1

P(y 1) P

P(y 1) P x u

P(y 1) P u x

eP(y 1) x

1 e

y

Page 89: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial models

The probability that y = 2 is:

i 1 i

i 1 i

2

2

x x

2 i 1 i x x

* *2 1i iP(y 2) P P

e eP(y 2) x x

1 e 1 e

y y

Altogether we have:

1 i

2 i 1 i

3 i 2 i

k 1 i

P(Y 1) x

P(Y 2) x x

P(Y 3) x x

P(Y k) 1 x

M

Page 90: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Probability in a ordered model

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

y=3y=2y=1 y=k

1 ix 2 ix 3 ix k 1 ix

ui

Page 91: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The likelihood function

j

0 n

k n

dyn k

j i j-1 ii=1 j=1

y, x,

avec

F( - x ) 0

F( - x ) 1

L( , ) = F( x ) F( x )

The likelihood function is

Page 92: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

If ui follows a logistic distribution, the (log) likelihood function reads :

j i j-1 i

j i j-1 i

x xn kji x x

i 1 j 1

jj i j-1 i

j i j-1 i

dyx xn k

x xi=1 j=1

y, x,

et donc

e ey, x, dy ln

1 e 1 e

e eL( , ) =1 e 1 e

LL( , ) =

The likelihood function

Page 93: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial logit models

Stata Instruction : ologit

ologit y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : omits the constant

robust : controls for heteroskedasticity

if : select observations

weight : weights observations

Page 94: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Ordered multinomial modelsuse est_var_qual.dta, clear ologit innovativeness size rdi spe biotech

Goodness of fit

Estimated parameters

Cutoff points

Page 95: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

i i1.95

i 1.95

i

1P(y 1) P x ue

P(y 1) P 270.5 u 268.6 .12451 e

P(y 1) P u 1.9

A positive (negative) sign indicates a positive relationship between the independent variable and the order (or rank)

How does one interpret the cutoff values? The model is:

What is then the probability that Y = 1 : P(Y = 1) ? What is the probability that the score be inferior to the first cutoff point? i iScore x u

Page 96: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

i i1.95

i 1.95

i 2 i 1 i

i i1.95

i 1.95

i

1

2

P(y 1) P x ue

P(y 1) P 270.5 u 268.6 .12451 e

P(y 1) P u 1.9 P(Y 2) F x F x

P(Y

P(y 1) P x ue

P(y 1) P 270.5 u 269.3 .23211 e

P(y 1) P u 1.2

2) .2321 .1245

P(Y 2) .1076

What is the probability that Y = 2 : P(Y = 2) ?

Interpretation of coefficients

Page 97: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

STATA computation of pred. prob.prvalue computes the predicted probabilities

Page 98: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Count Data Models Part 1

The Poisson Model

Page 99: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr
Page 100: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Count data models

Let us now focus on outcome counting the number of

occurrences a given event. Analyzing the number of

innovations, the number patents, of invention.

Again OLS fails to meet the constrain that the prediction

must be nil or positive. To explain count variables, we

assume that the dependent variable follows a Poisson

distribution.

Page 101: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Poisson models

Let Y be a random count variabl. The probability that Y be equal to

integer yi is given by the Poisson probability density distribution:

To introduce the set of explanatory variables in the model, we condition

λi and impose the following log linear form:

i iyi

i ii

i

eP Y y , y 0,1,2,...

y !

with E Y var Y

ixi

i i

e

ln x

Page 102: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Poisson distributions

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0,8 1,5

2,9 10,5

Lambda

Page 103: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

The (log) likelihood function reads:

i

nx

i i ii 1

i iyni

i=1 i

y,

et donc

y, x, y x e ln y !

eL( ) =

y !

LL( ) =

The likehood function

Page 104: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Poisson models

Stata Instruction : poisson

Poisson y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : omits the constant

robust : controls for heteroskedasticity

if : select observations

weight : weights observations

Page 105: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Poisson modelsuse est_var_qual.dta, clear Poisson patent lrdi lassets spe biotech

Estimated parameters

Goodness of fit

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -3549.8433 Pseudo R2 = 0.1992 Prob > chi2 = 0.0000 LR chi2(4) = 1766.11Poisson regression Number of obs = 431

Iteration 2: log likelihood = -3549.8433 Iteration 1: log likelihood = -3549.8433 Iteration 0: log likelihood = -3549.9316

. poisson patent lrdi lassets spe biotech

Page 106: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

i i

ln x 1 xln x ; x

x x xln

If variables are entered in log, one can interpret the coefficients as elasticities

A one % increase in firm size is associated with a .47% increase in the expected number of patents

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Page 107: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

i i

ln x 1 xln x ; x

x x xln

A one % increase R&D investment is associated with a .69% increase in the expected number of patents

If variables are entered in log, one can interpret the coefficients as elasticities

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Page 108: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

A one-point rise in the degree of specialisation is associated with a 113% increase in the expected number of patents

If variables are not entered in log, the interpretation changes

100 × (eβ – 1)

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Page 109: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

For dummy variables, the interpretation changes slightly

Biotechnology firms have an expected number of patents which is 191% higher than pharmaceutical companies.

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Page 110: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

All variables are very significant

… but …

E Y var Y

Interpretation of coefficients

patent 431 10.83295 17.622 0 202 Variable Obs Mean Std. Dev. Min Max

Page 111: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Count Data Models Part 2Negative Binomial Models

Page 112: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Negative binomial models

Generally, the Poisson model is not valid, due to the presence of

overdispersion in the data. This violates the asumption of equality

between the mean and variance if the dependent variable implied by

the Poisson model.

The negative binomial model treats this problem by adding to the log

linear form a unobserved heterogeneity term ui:

i i i i iln v ln ln u x

ii iyu

i ii

i

e uP Y y

y !

Page 113: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Negative binomial modelsThe density of yi is obtained by taking the density of ui :

ii i

i

yuui i 1

i i i i i ii0

wie u

f Y y | x g u du e uth g uy !

Assuming that ui is distributed Gamma with mean 1, the density of

yi reads:

iy

i ii i

i i i

yY y x

y 1

f |

Page 114: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Likelihood Functions

i

i

yn

i i

i 1 i i i

nx

i i ii 1

yL y, ,

y 1

LL y,x , y x y ln e ln

where α is the overdispersion parameter

Page 115: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Negative binomial models

Stata Instruction: nbreg

nbreg y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : omits the constant

robust : controls for heteroskedasticity

if : select observations

weight : weights observations

Page 116: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Negative binomial modelsuse est_var_qual.dta, clear nbreg poisson PAT rdi size spe biotech

Goodness of fit

Estimated parameters

Overdispersion parameter

Overdispersiontest

Likelihood-ratio test of alpha=0: chibar2(01) = 4351.01 Prob>=chibar2 = 0.000 alpha 1.382602 .1049868 1.191411 1.604473 /lnalpha .323967 .0759342 .1751387 .4727953 _cons -5.179659 .8629357 -6.00 0.000 -6.870982 -3.488337 biotech 1.515035 .2384884 6.35 0.000 1.047606 1.982464 spe .8390795 .1950958 4.30 0.000 .4566988 1.22146 lassets .6167106 .0600372 10.27 0.000 .4990399 .7343813 lrdi .7823229 .1121344 6.98 0.000 .5625434 1.002102 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1374.339 Pseudo R2 = 0.0427Dispersion = mean Prob > chi2 = 0.0000 LR chi2(4) = 122.62Negative binomial regression Number of obs = 431

Page 117: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

If variables are entered in log, one can still interpret the coefficients as elasticities

A one % increase in firm size is associated with a .61% increase in the expected number of patents

Page 118: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

If variables are entered in log, one can still interpret the coefficients as elasticities

A one % increase in R&D investment is associated with a .78% increase in the expected number of patents

Page 119: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

If variables are not entered in log, the interpretation changes

100 × (eβ – 1)

A one-point rise in the degree of specialisation is associated with a 129% increase in the expected number of patents

Page 120: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Interpretation of coefficients

For dummy variables, the interpretation follows the same transformation

100 × (eβ – 1)

Biotechnology firms have an expected number of patents which is 352% higher than pharmaceutical companies.

Page 121: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Overdispersion test

We use the LR test to compare the negative binomial model with the Poisson model

NBREG PRMLR 2 ln L ln L 2 3055 6110

-4536-1481 -

The results indicate the probability to reject H0 wrongly is almost nil (H0: Alpha=0). Hence there is overdispersion in the data and as a

consequence one shopuld use the negative binomial model

Page 122: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Larger standard errors and lower z values

legend: b/t N 431 431 alpha 1.383 Statistics 4.27 _cons 0.324 lnalpha -15.86 -6.00 _cons -3.127 -5.180 20.81 6.35 biotech 1.073 1.515 17.26 4.30 spe 0.762 0.839 35.22 10.27 lassets 0.471 0.617 21.43 6.98 lrdi 0.699 0.782 patent Variable Poisson NegBin

Page 123: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

Extensions

Page 124: Class 6 Qualitative Dependent Variable Models SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr

ML estimators All models can be extended to a panel context to take full

account of unobserved heterogeneity Fixed effect Random effects

Heckman models Selection bias Two equations, one on the probability to be observed

Survival models Discrete time (complementary log-log, logit) Continuous time (Cox model)