class 6 qualitative dependent variable models skema ph.d programme 2010-2011 lionel nesta...

Class 6

Qualitative Dependent Variable Models

SKEMA Ph.D programme2010-2011

Lionel Nesta

Observatoire Français des Conjonctures Economiques

[email protected]

Structure of the class

1. The linear probability model

2. Maximum likelihood estimations

3. Binary logit models and some other models

4. Multinomial models

5. Ordered multinomial models

6. Count data models

The Linear Probability Model

The linear probability model

When the dependent variable is binary (0/1, for example, Y=1 if the firm innovates, 0 otherwise), OLS is called the linear probability model.

0 1 1 2 2Y x x u

How should one interpret βj? Provided that OLS4 – E(u|X)=0 – holds true, then:

0 1 1 2 2E(Y | X) x x

Y follows a Bernoulli distribution with expected value P. This model is called the linear probability model because its expected value, conditional on X, and written E(Y|X), can be interpreted as the conditional probability of the occurrence of Y given values of X.

E(Y | X) Pr(Y 1| X)

1 E(Y | X) Pr(Y 0 | X)

β measures the variation of the probability of success for a one-unit variation of X (ΔX=1)

E(Y | X) Pr(Y 1| X)Pr(Y 1| X)

X X

The linear probability model

Non normality of errors

OLS6 : The error term is independent of all RHS and follows a normal distribution with zero mean and variance σ²

Since the errors are the complement to unity of the conditional probability, they follow a Bernoulli distribution, not a normal distribution.

2u Normal(0, )

Limits of the linear probability model (1)

Non normality of errors

0.5

11

.52

2.5

De

nsi

ty

-1 -.5 0 .5Residuals


Heteroskedastic errors

OLS5 : The variance of the error term, u, conditional on RHS, is the same for all values of RHS

The error term is itself distributed Bernoulli, and its variance depends on X. Hence it is heteroskedastic

21 2 kVar u x ,x , ,x

Var(u) P(1 P) E(Y | X) (1 E(Y | X))


Heteroskedastic errors

-1-.

50

.5R

esi

du

als

.4 .6 .8 1 1.2Fitted values


Fallacious predictions

By definition, a probability is always in the unit interval [0;1]

But OLS does not guarantee this condition Predictions may lie outside the bound [0;1] The marginal effect is constant , since P = E(Y|X) grows linearly with X. This is not very realistic (ex: the probability to give birth conditional on the number of

children already born)

0 E Y | X 1



01

23

De

nsi

ty




A downward bias in the coefficient of determination R²

Observed values are 1 or 0, whereas predictions should lie between 0 and 1: [0;1].

Comparing predicted with observed variables, the goodness of fit as assessed by the R² is systematically low .


Fallacious predictions0

.2.4

.6.8

1D

um

my

inn

ova

tion


Fallacious predictions which lower the R2


1. Non normality of errors

2. Heteroskedastic errors

3. Fallacious predictions

4. A downward bias in the R² 0 E Y | X 1

21 2 kVar u x ,x , ,x

2u Normal(0, )


Overcoming the limits of the LPM1. Non normality of errors Increase sample size

2. Heteroskedastic errors Use robust estimators

3. Fallacious prediction Perform non linear or constrained regressions

4. A downward bias in the R² Do not use it as a measure of goodness of fit

Persistent use of LPM

Although it has limits, the LPM is still used

1. In the process of data exploration (early stages of the research)

2. It is a good indicator of the marginal effect of the representative observation (at the mean)

3. When dealing with very large samples, least squares can overcome the complications imposed by maximum likelihood techniques.

Time of computation Endogeneity and panel data problems

The LOGIT Model

Probability, odds and logit We need to explain the occurrence of an event: the LHS

variable takes two values : y={0;1}.

In fact, we need to explain the probability of occurrence of the event, conditional on X: P(Y=y | X) [0 ; 1]∈ .

OLS estimations are not adequate, because predictions can lie outside the interval [0 ; 1].

We need to transform a real number, say z to ]-∞;∈+∞[ into P(Y=y | X) [0 ; 1]∈ .

The logistic transformation links a real number z ]-∞;+∞[∈ to P(Y=y | X) [0 ; 1]∈ .It is also called the link function

The logit link function

Z

z

Z

ZZ Z

Z

z

e logit link

;

e

func

0;

e0;1 since e 1 e

1 e

is called the tion1 e

Let us make sure that the transformation of z lies between 0 and1

The logit model

Z

Z

Z Z

eP y 1| z

1 e1 1

P y 0 | z 1 P y 1| z 11 e 1 e

Hence the probability of any event to occur is :

But what is z?

ZZ

Z

P 1 ee

1 P 1 eP

ln1 P

z

The odds ratio is defined as the ratio of the probability and its complement. Taking the log yields z. Hence z is the log transform of the odds ratio.

This has two important characteristics :

1. Z ]-∞;+∞[ ∈ and P(Y=1) [0 ; 1]∈2. The probability is not linear in z (The plot linking z with

straight line)

The odds ratio

Probability, odds and logitP(Y=1) Odds

p(y=1)

1-p(y=1)Ln (odds)

0.01 1/99 0,01 -4,60

0.03 3/97 0,03 -3,48

0.05 5/95 0,05 -2,94

0.20 20/80 0,25 -1,39

0.30 30/70 0,43 -0,85

0.40 40/60 0,67 -0,41

0.50 50/50 1,00 0,00

0.60 60/40 1,50 0,41

0.70 70/30 2,33 0,85

0.80 80/20 4,00 1,39

0.95 95/5 19,0 2,94

0.97 97/3 32,3 3,48

0.99 99/1 99,0 4,60

The logit transformation

The preceding table matches levels of probability with the odds ratio.

The probability varies between 0 and 1, The odds varies between 0 and + ∞. The log of the odds varies between – ∞ and + ∞ .

Notice that the distribution of the log of the odds is symetrical.

Logistic probability density distribution

0.0

5.1

.15

.2.2

5D

en

sity

-10 -5 0 5 10Log (Odds ratio)

“The probability is not linear in z”

0.2

.4.6

.81

P(y

=1

| z)

-4 -2 0 2 4z

The logit link function

1 1 k k

Z

Z

z x x

e eHence is rewritten

1 e 1 e

Xβ

Xβ

z = Xβ

The whole trick that can overcome the OLS problem is then to posit:

But how can we estimate the above equation knowing that we do not observe z?

Maximum likelihood estimations OLS can be of much help. We will use Maximum Likelihood

Estimation (MLE) instead.

MLE is an alternative to OLS. It consists of finding the parameters values which is the most consistent with the data we have.

In Statistics, the likelihood is defined as the joint probability to observe a given sample, given the parameters involved in the generating function.

One way to distinguish between OLS and MLE is as follows:

OLS adapts the model to the data you have : you only have one model derived from your data. MLE instead supposes there is an infinity of

models, and chooses the model most likely to explain your data.

Let us assume that you have a sample of n random observations. Let f(yi ) be the probability that yi = 1 or yi = 0. The joint probability to observe jointly n values of yi is given by the likelihood function:

1 21

, ,..., ( )n

n ii

f y y y f y

We need to specify function f(.). It comes from the empirical descrite distribution of an event that can have only two outcome : a success (yi = 1) or a failure (yi = 0). This is the binomial distribution. Hence:

i i iiy 1 yk n ki

y 1 yi

i

1nf (y ) p (1 p) p (1 p) f (y ) p ( 1

yp)

k

Likelihood functions


Knowing p (as the logit), having defined f(.), we come up with the likelihood function:

i i

i i

i i

n ny 1 y

ii 1 i 1

y 1 yzn n

i z zi 1 i 1

y 1 yn n

ii 1 i 1

L y f (y ) p 1 p

e 1

,

L y, z f (y , z)1 e 1 e

e 1L y, x, f (y , )

1 e 1 e

Xβ

Xβ XβX β

The log transform of the likelihood function (the log likelihood) is much easier to manipulate, and is written:

n nz

ii 1 i 1

n n

ii 1 i 1

n

ii 1

LL y,z y z ln 1 e

LL y, x, y ln 1 e

LL y, x, ln 1 e y

Xβ

Xβ

Xβ

Xβ

Log likelihood (LL) functions

The LL function can yield an infinity of values for the parameters β.

Given the functional form of f(.) and the n observations at hand, which values of parameters β maximize the likelihood of my sample?

In other words, what are the most likely values of my unknown parameters β given the sample I have?

Maximum likelihood estimations

n

i i i zi 1

i zn

i i i ii 1

LLy x 0

ewhere

1 e²LL1 x x

However, there is not analytical solutions to this non linear problem. Instead, we rely on a optimization algorithm (Newton-Raphson)

The LL is globally concave and has a maximum. The gradient is used to compute the parameters of interest, and the hessian is used to compute the variance-covariance matrix.

Maximum likelihood estimations

You need to imagine that the computer is going to generate all possible values of β, and is going to compute a likelihood value for each (vector of ) values to then choose (the vector of) β such that the likelihood is

highest.

Example: Binary Dependent Variable

We want to explore the factors affecting the probability of being successful innovator (inno = 1): Why?

352 (81.7%) innovate and 79 (18.3%) do not.

The odds of carrying out a successful innovation is about 4 against 1 (as 352/79=4.45).

The log of the odds is 1.494 (z = 1.494)

For the sample (and the population?) of firms the probability of being innovative is four times higher than the probability of NOT being innovative

Instruction Stata : logit

logit y x1 x2 x3 … xk [if] [weight] [, options]

Options

noconstant : estimates the model without the constant

robust : estimates robust variances, also in case of heteroscedasticity

if : it allows to select the observations we want to include in the analysis

weight : it allows to weight different observations

Logistic Regression with STATA

Let’s start and run a constant only model logit inno

_cons 1.494183 .1244955 12.00 0.000 1.250177 1.73819 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -205.30803 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 431

Iteration 0: log likelihood = -205.30803

. logit inno

.

Goodness of fit

Parameter estimates, Standard errors and z values

Logistic Regression with STATA

What does this simple model tell us ?

Remember that we need to use the logit formula to transform the logit into a probability :

e

P(Y 1| )1 e

Xβ

XβX

Interpretation of Coefficients

The constant 1.491 must be interpreted as the log of the odds ratio.

Using the logit link function, the average probability to innovate is

dis exp(_b[_cons])/(1+exp(_b[_cons]))

We find exactly the empirical sample value: 81,7%

1,494

1,494

eP 0,817

1 e


A positive coefficient indicates that the probability of innovation success increases with the corresponding explanatory variable.

A negative coefficient implies that the probability to innovate decreases with the corresponding explanatory variable.

Warning! One of the problems encountered in interpreting probabilities is their non-linearity: the probabilities do not vary in the same way according to the level of regressors

This is the reason why it is normal in practice to calculate the probability of (the event occurring) at the average point of the sample



_cons -11.63447 1.937191 -6.01 0.000 -15.43129 -7.837643 biotech 3.799953 .577509 6.58 0.000 2.668056 4.93185 spe .4252844 .4204924 1.01 0.312 -.3988654 1.249434 lassets .997085 .1368534 7.29 0.000 .7288574 1.265313 lrdi .7527497 .2110683 3.57 0.000 .3390634 1.166436 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -163.45352 Pseudo R2 = 0.2039 Prob > chi2 = 0.0000 LR chi2(4) = 83.71Logistic regression Number of obs = 431

Iteration 4: log likelihood = -163.45352Iteration 3: log likelihood = -163.45376Iteration 2: log likelihood = -163.57746Iteration 1: log likelihood = -167.71312Iteration 0: log likelihood = -205.30803

. logit inno lrdi lassets spe biotech

Let’s run the more complete model logit inno lrdi lassets spe biotech

-11.63 0.75 0.99 0.43 3.79

-11.63 0.75 0.99 0.43 3.79

eP

1 e

rdi lassets spe biotech

rdi lassets spe bi

otech

Using the sample mean values of rdi, lassets, spe and biotech, we compute the conditional probability :

-11.63 0.75 0.99 0.43 3.79

-11.63 0

rdi lassets spe biotech

rdi lassets s.75 0.99 0.43 3.79pe biotech

eP

1 e

e0,8758

1 e

1.953

1.953


It is often useful to know the marginal effect of a regressor on the probability that the event occur (innovation)

As the probability is a non-linear function of explanatory variables, the change in probability due to a change in one of the explanatory variables is not identical if the other variables are at the average, median or first quartile, etc. level.

prvalue provides the predicted probabilities of a logit model (or any other) prvalue prvalue , x(lassets=10) rest(mean) prvalue , x(lassets=11) rest(mean) prvalue , x(lassets=12) rest(mean) prvalue , x(lassets=10) rest(median) prvalue , x(lassets=11) rest(median) prvalue , x(lassets=12) rest(median)

Marginal Effects

prchange provides the marginal effect of each of the explanatory variables for the majority of the variations of the desired values

prchange [varlist] [if] [in range] ,x(variables_and_values) rest(stat) fromto

prchange

prchange, fromto

prchange , fromto x(size=10.5) rest(mean)

Marginal Effects

Goodness of Fit Measures

In ML estimations, there is no such measure as the R2

But the log likelihood measure can be used to assess the goodness of fit. But note the following : The higher the number of observations, the lower the joint probability, the

more the LL measures goes towards -∞ Given the number of observations, the better the fit, the higher the LL

measures (since it is always negative, the closer to zero it is)

The philosophy is to compare two models looking at their LL values. One is meant to be the constrained model, the other one is the unconstrained model.


A model is said to be constrained when the observed set the parameters associated with some variable to zero.

A model is said to be unconstrained when the observer release this assumption and allows the parameters associated with some variable to be different from zero.

For example, we can compare two models, one with no explanatory variables, one with all our explanatory variables. The one with no explanatory variables implicitly assume that all parameters are equal to zero. Hence it is the constrained model because we (implicitly) constrain the parameters to be nil.

The likelihood ratio test (LR test) The most used measure of goodness of fit in ML estimations is the

likelihood ratio. The likelihood ratio is the difference between the unconstrained model and the constrained model. This difference is distributed 2.

If the difference in the LL values is (no) important, it is because the set of explanatory variables brings in (un)significant information. The null hypothesis H0 is that the model brings no significant information as

follows:

High LR values will lead the observer to reject hypothesis H0 and accept

the alternative hypothesis Ha that the set of explanatory variables does

significantly explain the outcome.

unc cLR 2 ln L ln L

The McFadden Pseudo R2

We also use the McFadden Pseudo R2 (1973). Its interpretation is analogous to the OLS R2. However its is biased doward and remain generally low.

Le pseudo-R2 also compares The likelihood ratio is the difference between the unconstrained model and the constrained model and is comprised between 0 and 1.

c unc2 uncMF

unc c

ln L ln L ln LPseudo R 1

ln L ln L




. logit inno lrdi lassets spe biotech, nolog

_cons 1.494183 .1244955 12.00 0.000 1.250177 1.73819 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -205.30803 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 431

Iteration 0: log likelihood = -205.30803

. logit inno

Constrained model

Unconstrained model

unc cLR 2 ln L ln L

2 163.5 205.3

83.8

2MF unc cPs.R 1 ln L ln L

1 163.5 205.3

0.204

Other usage of the LR test

The LR test can also be generalized to compare any two models, the unconstrained one being nested in the constrained one.

Any variable which is added to a model can be tested for its explanatory power as follows : logit [modèle contraint]

est store [nom1]

logit [modèle non contraint]

est store [nom2]

lrtest nom2 nom1


LR test on the added variable (biotech)

unc cLR 2 ln L ln L

2 163.5 191.8

56.8

(Assumption: model1 nested in model2) Prob > chi2 = 0.0000Likelihood-ratio test LR chi2(1) = 56.78

. lrtest model2 model1

. est store model2



. logit inno lrdi lassets spe biotech, nolog

. est store model1

_cons -.4703812 .9313494 -0.51 0.614 -2.295793 1.35503 spe .3739987 .3800765 0.98 0.325 -.3709376 1.118935 lassets .3032756 .0792032 3.83 0.000 .1480402 .4585111 lrdi .9275668 .1979951 4.68 0.000 .5395037 1.31563 inno Coef. Std. Err. z P>|z| [95% Conf. Interval]


. logit inno lrdi lassets spe, nolog

Quality of predictions

Lastly, one can compare the quality of the prediction with the observed outcome variable (dummy variable).

One must assume that when the probability is higher than 0.5, then the prediction is that the vent will occur (most likely

And then one can compare how good the prediction is as compared with the actual outcome variable.

STATA does this for us:

estat class

Quality of predictions

Correctly classified 84.69% False - rate for classified - Pr( D| -) 34.88%False + rate for classified + Pr(~D| +) 13.14%False - rate for true D Pr( -| D) 4.26%False + rate for true ~D Pr( +|~D) 64.56% Negative predictive value Pr(~D| -) 65.12%Positive predictive value Pr( D| +) 86.86%Specificity Pr( -|~D) 35.44%Sensitivity Pr( +| D) 95.74% True D defined as inno != 0Classified + if predicted Pr(D) >= .5

Total 352 79 431 - 15 28 43 + 337 51 388 Classified D ~D Total True

Logistic model for inno

. estat class

The Logit model is only one way of modeling binary choice models

The Probit model is another way of modeling binary choice models. It is actually more used than logit models and assume a normal distribution (not a logistic one) for the z values.

The complementary log-log models is used where the occurrence of the event is very rare, with the distribution of z being asymetric.

Other Binary Choice models

Other Binary Choice models

Probit model

Complementary log-log model

22 2z 2z e e

Pr(Y 1| X) dz dz t dz2 2

X β

X β X βXβ

Pr(Y 1| X) 1 exp exp( ) X β X β

Likelihood functions and Stata commands

1

1 1

1

1 1

1

1( , , ) ( , , )

1 1

( , , ) ( , , ) ( ) 1 ( )

( , , ) ( , , ) 1 exp( exp( )) exp( exp(

i i

i i

i

y yn n

i ii i

n ny y

i ii i

ny

i ii

eL y x f y x

e e

L y x f y x

L y x f y x

X β

X β X β

X β X β

X β

Logit :

Probit :

Log-log comp : 1

1

)) in

y

i

X β

Example logit inno rdi lassets spe pharmaprobit inno rdi lassets spe pharmacloglog inno rdi lassets spe pharma

Probability Density Functions

0.1

.2.3

.4y

-4 -2 0 2 4x

Probit Transformation Logit TransformationComplementary log log Transformation

Cumulative Distribution Functions

0.2

.4.6

.81

y

-4 -2 0 2 4x

Probit Transformation Logit TransformationComplementary log log Transformation

Comparison of modelsOLS Logit Probit C log-log

Ln(R&D intensity) 0.110 0.752 0.422 354

[3.90]*** [3.57]*** [3.46]*** [3.13]***

ln(Assets) 0.125 0.997 0.564 0.493

[8.58]*** [7.29]*** [7.53]*** [7.19]***

Spe 0.056 0.425 0.224 0.151

[1.11] [1.01] [0.98] [0.76]

BiotechDummy 0.442 3.799 2.120 1.817

[7.49]*** [6.58]*** [6.77]*** [6.51]***

Constant -0.843 -11.634 -6.576 -6.086

[3.91]** [6.01]*** [6.12]*** [6.08]***

Observations 431 431 431 431

Absolute t value in brackets (OLS) z value for other models.

* 10%, ** 5%, *** 1%

Comparison of marginal effects

OLS Logit Probit C log-log

Ln(R&D intensity) 0.110 0.082 0.090 0.098

ln(Assets) 0.125 0.110 0.121 0.136

Specialisation 0.056 0.046 0.047 0.042

Biotech Dummy 0.442 0.368 0.374 0.379

For all models logit, probit and cloglog, marginal effects have been computed for a one-unit variation (around the mean) of the variable at stake, holding all other variables at the sample mean values.

Multinomial LOGIT Models

Multinomial modelsLet us now focus on the case where the dependent variable has

several outcomes (or is multinomial). For example, innovative firms

may need to collaborate with other organizations. One can code this

type of interactions as follows Collaborate with university (modality 1) Collaborate with large incumbent firms (modality 2) Collaborate with SMEs (modality 3) Do it alone (modality 4)

Or, studying firm survival Survival (modality 1) Liquidation (modality 2) Mergers & acquisition (modality 3)

One could first perform three logistic regressions as follows :

(1) (1) (1)0 1 1 m m

(2) (2) (2)0 1 1 m m

(3) (3) (3)0 1 1 m m

P(Y 1| X)ln x x

1 P(Y 1| X)

P(Y 2 | X)ln x x

1 P(Y 2 | X)

P(Y 3 | X)ln x x

1 P(Y 3 | X)

Where 1 = survival, 2 = liquidation, 3 = M&A.1. Open the file mlogit.dta2. Estimate for each type of outcome the conditional probability

of the event for the representative firm - time (log_time) - size (log labour)- firm age (entry_age)- Spin out (spin_out)- Cohort (cohort_*)

Multinomial models

(1) (1) (1)0 1 1 m m

(2) (2) (2)0 1 1 m m

(3) (3) (3)0 1 1 m m

P(Y 1| X)ln x x

1 P(Y 1| X)

P(Y 2 | X)ln x x

1 P(Y 2 | X)

P(Y 3 | X)ln x x

1 P(Y 3 X)

|

P(Y 1| X) 0.8771

P(Y 2 | X) 0.0398

P(Y 3 | X) 0.0679

k

P(Y k | X) 0.9848 1

The need for multinomial models

First, the sum of all conditional probabilities should add up to unity

k

j k

P Y 0 | X 1 P Y j | X

k

j 0

P Y j | X 1

Second, for k outcomes, we need to estimate (k-1) modality. Hence

Multinomial models

Third, the multinomial model is a simultaneous (as opposed to

sequential) estimation model comparing the odds of each modality

with respect to all others. With three outcomes, we have:

(1|0) (1|0) (1|0)0 1 1 m m

(2|0) (2|0) (2|0)0 1 1 m m

(1|2) (1|2) (1|2)0 1 1 m m

P(Y 1| X)ln x x

P(Y 0 | X)

P(Y 2 | X)ln x x

P(Y 0 | X)

P(Y 1| X)ln x x

P(Y 2 | X)

Multinomial logit models

P Y 1| X P Y 2 | X P Y 1| Xln ln ln

P Y 0 | X P Y 0 | X P Y 2 | X

Note that there is redundancy, since :

1|0 2|0 1|2P Y 1| X P Y 2 | X P Y 1| Xln x ;ln x ;ln x

P Y 0 | X P Y 0 | X P Y 2 | X

1|0 2|0 1|2x x x

1|0 2|0 1|2

Fourth, the multinomial logit model estimates (k – 1) outcomes with following constrained:


( j|0 )

( j|0 )

x

j kx

j 0

eP Y j | X

e

With k outcomes, the probability of occurrence of event j reads:

By convention, outcome 0 is the base outcome


j| j P Y j | Xx ln ln(1) 0

P Y j | X

Note that j| jx, j : 0

( j|0 )j kx

j 1

1P Y 0 | X

1 e

( j|0 )

( j|0 )

x

j kx

j 1

eP Y j | X

1 e

( j|0 )

( j|0 )

x

j kx

j 0

eP Y j | X

e


Binomial logit as multinomial logitLet us rewrite the probability of event that Y=1

The binomial logit binomial is a special case of the multinomial where only two outcomes are being analyzed.

(1|0) (1|0) (1|0)

(1|0) (0|0) (1|0) ( k|0)

x

x

x x x

x x x x

k 0,1

eP Y 1| X

1 e

e e eP Y 1| X

1 e e e e

Let us assume that you have a sample of n random observations. Let f(yj ) be the probability that yi = j. The joint probability to observe jointly n values of yj is given by the likelihood function:

1 21

, ,..., ( )n

n ii

f y y y f y

We need to specify function f(.). It comes from the empirical discrete distribution of an event that can have several outcomes. This is the multinomial distribution. Hence:

j0 1 k ki i i i idYdY dY dY dY

j 0 1 j k jj K

f (y ) p p p p p


The maximum likelihood function The maximum likelihood function reads:

ji

j0i i

( j|0)

( j|0) ( j|0)

n n kdY

i ji 1 i 1 j 1

dY dY

xn n k( j|0)

i i j k j kx xi 1 i 1 j 1

j 1 j 1

L(y) f y p

1 eL(y) f y , x ,

1 e 1 e

The maximum likelihood functionThe log transform of the likelihood yields

( j|0)i

( j|0) ( j|0)i i

( j|0)i i

xn k( j|0) 0 j

i ij k j kx xi 1 j 1

j 0 j 0

j kx x( j|0) j ( j|0)

i ij 0

1 eLL(y, x, ) dy ln dy ln

1 e 1 e

LL(y, x, ) ln 1 e dy x ln 1 e

( j|0)

( j|0)i

j kn k

i 1 j 1 j 0

j kn k kx( j|0) j ( j|0)

i ii 1 j 1 j 1 j 0

LL(y, x, ) dy x k 1 ln 1 e


Stata Instruction : mlogit

mlogit y x1 x2 x3 … xk [if] [weight] [, options]

Options : noconstant : omits the constant

robust : controls for heteroskedasticity

if : select observations

weight : weights observations

use mlogit.dta, clear mlogit type_exit log_time log_labour entry_age entry_spin cohort_*

Base outcome, chosen by STATA, with the highest empirical frequency

Goodness of fit

Parameter estimates, Standard errors and z values


Interpretation of coefficientsThe interpretation of coefficients always refer to the base category

Does the probability of being bought-out decrease overtime ?

No!Relative to survival the probability of being bought-out decrease overtime

Interpretation of coefficientsThe interpretation of coefficients always refer to the base category

Is the probability of being bought-out lower for spinoff?

No!Relative to survival the probability of being bought-out is lower for spinoff

Interpretation of coefficients

Relative to liquidation, the probability of being bought-out is higher for

spinoff

1|0 2|0 1|2 2|0 1|0 2|1

lincom [boughtout]entry_spin – [death]entry_spin

Changing base outcomemcross provides other estimates by changing the base ouctome

Mind the new base outcome!!

Being bought-out relative to liquidation

Relative to liquidation, the probability of being bought-out is

higher for spinoff

And we observe the same results as before

mcross provides other estimates by changing the base ouctome

Changing base outcome

Independence of irrelevant alternatives - IAA The model assumes that each pair of outcome is independent from all

other alternatives. In other words, alternatives are irrelevant.

From a statistical viewpoint, this is tantamount to assuming independence of the error terms across pairs of alternatives

A simple way to test the IIA property is to estimate the model taking off one modality (called the restrained model), and to compare the parameters with those of the complete model

If IIA holds, the parameters should not change significantly

If IIA does not hold, the parameters should change significantly

H0: The IIA property is valid

H1: The IIA property is not valid

1* * *

R C R C R Cˆ ˆ ˆ ˆ ˆ ˆH var var

The H statistics (H stands for Hausman) follows a χ² distribution with M degree of freedom (M being the number of parameters)

Independence of irrelevant alternatives - IAA

STATA application: the IIA test



mlogtest, hausman

Omitted variable

Application de IIA

mlogtest, hausmanWe compare the parameters of the model

“liquidation relative bought-out”estimated simultaneously with “survival relative to bought-out”

avec

the parameters of the model

“liquidation relative bought-out”estimated without

“survival relative to bought-out”



Application de IIA

mlogtest, hausman

The conclusion is that outcome survival significantly alters the choice between

liquidation and bought-out.

In fact for a company, being bought-out must be seen as a way to remain active with a cost of losing control on economic decision, notably

investment.



Ordered Multinomial LOGIT Models

Ordered multinomial models

Let us now concentrate on the case where the dependent variable is

a discrete integer which indicates an intensity. Opinion surveys make

an extensive use on such so-called Likert Scale:

Obstacles à l’innovation (échelle de 1 à 5) Intensité de collaboration (échelle de 1 à 5) Enquête de marketing (N’apprécie pas (1) – Apprécie (7)) Note d’étudiants Test d’opinion Etc.


*n 1

*n1 2*n2 3

*3 k

y 1 si y

y 2 si y

y 3 si y

y k si y

M

Such variables depict a vertical scale – quantitative, so that one can

think of them as describing the interval in which an unobserved

latent variable y* lies:

where αj are unknown bounds to be estimated.


i i*i x uy

We assume that the latent variable y* is a linear combination of the

set of all explanatory variables

where ui follows a cumulative distribution function F(.). The

probabilities with each occurrence y (y ≠ y*) are then following the cdf

F(.). Let us look at the probability that y = 1 :

1 i

1 i

i i

i i

x

1 i x

*1i

1

1

P(y 1) P

P(y 1) P x u

P(y 1) P u x

eP(y 1) x

1 e

y


The probability that y = 2 is:

i 1 i

i 1 i

2

2

x x

2 i 1 i x x

* *2 1i iP(y 2) P P

e eP(y 2) x x

1 e 1 e

y y

Altogether we have:

1 i

2 i 1 i

3 i 2 i

k 1 i

P(Y 1) x

P(Y 2) x x

P(Y 3) x x

P(Y k) 1 x

M

Probability in a ordered model

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

y=3y=2y=1 y=k

1 ix 2 ix 3 ix k 1 ix

ui

The likelihood function

j

0 n

k n

dyn k

j i j-1 ii=1 j=1

y, x,

avec

F( - x ) 0

F( - x ) 1

L( , ) = F( x ) F( x )

The likelihood function is

If ui follows a logistic distribution, the (log) likelihood function reads :

j i j-1 i

j i j-1 i

x xn kji x x

i 1 j 1

jj i j-1 i

j i j-1 i

dyx xn k

x xi=1 j=1

y, x,

et donc

e ey, x, dy ln

1 e 1 e

e eL( , ) =1 e 1 e

LL( , ) =

The likelihood function

Ordered multinomial logit models

Stata Instruction : ologit

ologit y x1 x2 x3 … xk [if] [weight] [, options]





Ordered multinomial modelsuse est_var_qual.dta, clear ologit innovativeness size rdi spe biotech

Goodness of fit

Estimated parameters

Cutoff points


i i1.95

i 1.95

i

1P(y 1) P x ue

P(y 1) P 270.5 u 268.6 .12451 e

P(y 1) P u 1.9

A positive (negative) sign indicates a positive relationship between the independent variable and the order (or rank)

How does one interpret the cutoff values? The model is:

What is then the probability that Y = 1 : P(Y = 1) ? What is the probability that the score be inferior to the first cutoff point? i iScore x u

i i1.95

i 1.95

i 2 i 1 i

i i1.95

i 1.95

i

1

2

P(y 1) P x ue

P(y 1) P 270.5 u 268.6 .12451 e

P(y 1) P u 1.9 P(Y 2) F x F x

P(Y

P(y 1) P x ue

P(y 1) P 270.5 u 269.3 .23211 e

P(y 1) P u 1.2

2) .2321 .1245

P(Y 2) .1076

What is the probability that Y = 2 : P(Y = 2) ?


STATA computation of pred. prob.prvalue computes the predicted probabilities

Count Data Models Part 1

The Poisson Model

Count data models

Let us now focus on outcome counting the number of

occurrences a given event. Analyzing the number of

innovations, the number patents, of invention.

Again OLS fails to meet the constrain that the prediction

must be nil or positive. To explain count variables, we

assume that the dependent variable follows a Poisson

distribution.

Poisson models

Let Y be a random count variabl. The probability that Y be equal to

integer yi is given by the Poisson probability density distribution:

To introduce the set of explanatory variables in the model, we condition

λi and impose the following log linear form:

i iyi

i ii

i

eP Y y , y 0,1,2,...

y !

with E Y var Y

ixi

i i

e

ln x

Poisson distributions

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0,8 1,5

2,9 10,5

Lambda

The (log) likelihood function reads:

i

nx

i i ii 1

i iyni

i=1 i

y,

et donc

y, x, y x e ln y !

eL( ) =

y !

LL( ) =

The likehood function

Poisson models

Stata Instruction : poisson

Poisson y x1 x2 x3 … xk [if] [weight] [, options]





Poisson modelsuse est_var_qual.dta, clear Poisson patent lrdi lassets spe biotech


Goodness of fit

_cons -3.12651 .1971912 -15.86 0.000 -3.512997 -2.740022 biotech 1.073271 .051571 20.81 0.000 .9721939 1.174349 spe .7623891 .0441729 17.26 0.000 .6758118 .8489664 lassets .4705428 .0133588 35.22 0.000 .4443601 .4967256 lrdi .699484 .0326467 21.43 0.000 .6354977 .7634704 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -3549.8433 Pseudo R2 = 0.1992 Prob > chi2 = 0.0000 LR chi2(4) = 1766.11Poisson regression Number of obs = 431

Iteration 2: log likelihood = -3549.8433 Iteration 1: log likelihood = -3549.8433 Iteration 0: log likelihood = -3549.9316

. poisson patent lrdi lassets spe biotech


i i

ln x 1 xln x ; x

x x xln

If variables are entered in log, one can interpret the coefficients as elasticities

A one % increase in firm size is associated with a .47% increase in the expected number of patents



i i

ln x 1 xln x ; x

x x xln

A one % increase R&D investment is associated with a .69% increase in the expected number of patents

If variables are entered in log, one can interpret the coefficients as elasticities



A one-point rise in the degree of specialisation is associated with a 113% increase in the expected number of patents

If variables are not entered in log, the interpretation changes

100 × (eβ – 1)



For dummy variables, the interpretation changes slightly

Biotechnology firms have an expected number of patents which is 191% higher than pharmaceutical companies.


All variables are very significant

… but …

E Y var Y


patent 431 10.83295 17.622 0 202 Variable Obs Mean Std. Dev. Min Max

Count Data Models Part 2Negative Binomial Models

Negative binomial models

Generally, the Poisson model is not valid, due to the presence of

overdispersion in the data. This violates the asumption of equality

between the mean and variance if the dependent variable implied by

the Poisson model.

The negative binomial model treats this problem by adding to the log

linear form a unobserved heterogeneity term ui:

i i i i iln v ln ln u x

ii iyu

i ii

i

e uP Y y

y !

Negative binomial modelsThe density of yi is obtained by taking the density of ui :

ii i

i

yuui i 1

i i i i i ii0

wie u

f Y y | x g u du e uth g uy !

Assuming that ui is distributed Gamma with mean 1, the density of

yi reads:

iy

i ii i

i i i

yY y x

y 1

f |

Likelihood Functions

i

i

yn

i i

i 1 i i i

nx

i i ii 1

yL y, ,

y 1

LL y,x , y x y ln e ln

where α is the overdispersion parameter

Negative binomial models

Stata Instruction: nbreg

nbreg y x1 x2 x3 … xk [if] [weight] [, options]





Negative binomial modelsuse est_var_qual.dta, clear nbreg poisson PAT rdi size spe biotech

Goodness of fit


Overdispersion parameter

Overdispersiontest

Likelihood-ratio test of alpha=0: chibar2(01) = 4351.01 Prob>=chibar2 = 0.000 alpha 1.382602 .1049868 1.191411 1.604473 /lnalpha .323967 .0759342 .1751387 .4727953 _cons -5.179659 .8629357 -6.00 0.000 -6.870982 -3.488337 biotech 1.515035 .2384884 6.35 0.000 1.047606 1.982464 spe .8390795 .1950958 4.30 0.000 .4566988 1.22146 lassets .6167106 .0600372 10.27 0.000 .4990399 .7343813 lrdi .7823229 .1121344 6.98 0.000 .5625434 1.002102 patent Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1374.339 Pseudo R2 = 0.0427Dispersion = mean Prob > chi2 = 0.0000 LR chi2(4) = 122.62Negative binomial regression Number of obs = 431


If variables are entered in log, one can still interpret the coefficients as elasticities

A one % increase in firm size is associated with a .61% increase in the expected number of patents


If variables are entered in log, one can still interpret the coefficients as elasticities

A one % increase in R&D investment is associated with a .78% increase in the expected number of patents


If variables are not entered in log, the interpretation changes

100 × (eβ – 1)

A one-point rise in the degree of specialisation is associated with a 129% increase in the expected number of patents


For dummy variables, the interpretation follows the same transformation

100 × (eβ – 1)

Biotechnology firms have an expected number of patents which is 352% higher than pharmaceutical companies.

Overdispersion test

We use the LR test to compare the negative binomial model with the Poisson model

NBREG PRMLR 2 ln L ln L 2 3055 6110

-4536-1481 -

The results indicate the probability to reject H0 wrongly is almost nil (H0: Alpha=0). Hence there is overdispersion in the data and as a

consequence one shopuld use the negative binomial model

Larger standard errors and lower z values

legend: b/t N 431 431 alpha 1.383 Statistics 4.27 _cons 0.324 lnalpha -15.86 -6.00 _cons -3.127 -5.180 20.81 6.35 biotech 1.073 1.515 17.26 4.30 spe 0.762 0.839 35.22 10.27 lassets 0.471 0.617 21.43 6.98 lrdi 0.699 0.782 patent Variable Poisson NegBin

Extensions

ML estimators All models can be extended to a panel context to take full

account of unobserved heterogeneity Fixed effect Random effects

Heckman models Selection bias Two equations, one on the probability to be observed

Survival models Discrete time (complementary log-log, logit) Continuous time (Cox model)

class 6 qualitative dependent variable models skema ph.d programme 2010-2011 lionel nesta...

Documents