vorlesung generalized linear regressionmodels

26
Vorlesung Generalized Linear Regressionmodels Antonia Rom

Upload: jerzy

Post on 01-Feb-2016

79 views

Category:

Documents


0 download

DESCRIPTION

Vorlesung Generalized Linear Regressionmodels. Antonia Rom. Chapter 4 - Modeling of B inary Data Introduction What is important in modeling? Problems, Obstacles 4.1 Maximum Likelihood Estimation What is the ML-estimation? Single Binary Response Grouped Data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Vorlesung  Generalized Linear  Regressionmodels

Vorlesung Generalized Linear Regressionmodels

Antonia Rom

Page 2: Vorlesung  Generalized Linear  Regressionmodels

2VO, Wien, 2014-05-15

Chapter 4 - Modeling of Binary Data

IntroductionWhat is important in modeling?Problems, Obstacles

4.1 Maximum Likelihood EstimationWhat is the ML-estimation?Single Binary ResponseGrouped DataAsymptotic PropertiesExistence of ML-EstimatesEstimation conditioned on predictor values

Page 3: Vorlesung  Generalized Linear  Regressionmodels

3VO, Wien, 2014-05-15

Introduction

A generalized linear model consists of:

- a probability distribution from the exponential family

- a linear predictor η = Xβ

- a link function (with a response function h such that E(Y) = h-1(η))

Binary regression model

h is a fully specified function.

In this chapter the logit model is used. In this case h is the logistic distribution function

Linear predictor

Response functionProbability

Page 4: Vorlesung  Generalized Linear  Regressionmodels

4VO, Wien, 2014-05-15

The link function is the inverse of the response function = h’

It determines functional form of the response probabilities.

The Linear predictor determines which variables are included and in what form they determine the response - The unknown parameters, β, can be estimated with maximum likelihood estimation.

The maximum likelihood estimation is a iterative algorithm.

-> Linear predictors can contain polynomial versions of continuous variables, dummy variables and interaction effects.

Care should be taken when specifying constituents of the model like

the linear predictors!

Introduction

Page 5: Vorlesung  Generalized Linear  Regressionmodels

5VO, Wien, 2014-05-15

• Discrepancy between data and model. Does the fit of the model support the inferences drawn in the model?

• Relevance of variables and form of the linear predictor. Which variables should be included and how?

• Explanatory power of the covariates

• Prognostic power of the model

• Choice of link funkction. Which link funktion fits the data well and has a simple interpretation!

Aspects are not independent:

Model should present appropriate approximation with simple predictor, specification determines the goodness-of-fit

linear predictor aims at finding an adequate form of covariates, reducing variable set, explanatory value aims at quantifying the effect of the covariates within the model

First chapter about estimation - Maximum likelihood estimation!!

Introduction

Page 6: Vorlesung  Generalized Linear  Regressionmodels

6VO, Wien, 2014-05-15

Maximum-Likelihood Estimation

Basic principle is to construct the likelihood of the unknown parameters for the sample data. (Which parameter (mean, variance) makes the sample the most likely.) The distribution has to be known!

The likelihood represents the joint probability or probability density of the observed data, considered as a function of the unknown parameters.

What does this mean in praxis?

Example:

2 MP3 – Players , exact the same, only shuffle mode without display!

1 with 5 songs 1 with 20 songs

Each MP3-Player contains your favorite song,

Unfortunately you mixed both of them. So you take one, turn it on and your favorite song is played.

If you would have to bet, which one would you choose? – The one with 5 songs!!!

Page 7: Vorlesung  Generalized Linear  Regressionmodels

7VO, Wien, 2014-05-15

An event A happened. One tries to find inference on an underlying variable B (e.g. a special parameter). Therefore one looks on the conditional probability for A for all possible estimations ˆbi of B, if ˆbi is true. The value of ˆbi, for which P(A|ˆbi) is a maximum, is the best predictor for b.

The conditional probability P(A|ˆbi) counts for the given event A. P(A|ˆbi) is also called L(ˆbi) (Likelihood of ˆbi).

The ML-estimator is the value, for which the likelihood is a maximum. -> therefore the name Maximum Likelihood.

Maximum-Likelihood Estimation

Page 8: Vorlesung  Generalized Linear  Regressionmodels

8VO, Wien, 2014-05-15

If there are too many ˆbi, it is not possible to calculate every L(ˆbi).In this case a Likelihood-Function is built and the maximum is calculated with L’(ˆb) = 0.

Fact (1)The likelihood L(X) is not the probability for the event x to happen, but the conditional probability for the already happened event y, if x already happened before.

L(X) = P(Z|X)

Fact (2)Sum of all Likelihoods is not 1.

Maximum-Likelihood Estimation

Page 9: Vorlesung  Generalized Linear  Regressionmodels

9VO, Wien, 2014-05-15

The likelihood for the response is given by

Observations are considered independent. The maximum likelihood of β are those values of β^ that maximizes the likelihood.

L values can get very small so log-likelihood is used instead

The value β^ can be obtained by solving the system of equations

Derivatives are the so-called score function s(β)

(iterative solving)

MLE – Definition: Single binary response

Page 10: Vorlesung  Generalized Linear  Regressionmodels

10VO, Wien, 2014-05-15

MLE – Definition: grouped data – binomially distributed responses

Several, independent binary responses

P is assumed to depend on x only, the mean is assumed to be the same for all the binary observations at this value.

The model has the form:

For the collection of binary variables the likelihood has the form

The likelihood for the number of success defined as Lbin(β) and the binary observation likelihood L(β) differ in the binomial factor , which is irrelevant during maximization, because it doesn‘t depend on β. Therefore the log-likelihood is:

The score function of the logit model is:

Page 11: Vorlesung  Generalized Linear  Regressionmodels

11VO, Wien, 2014-05-15

MLE – Definition: Asymptotic Properties

The MLE has some favorable properties.

The MLE estimator exists and is unique asymptotically.

It is consistent and asymptotically normally distributed.

It is asymptotically efficient.

Consistency

Likelihood is a smooth function and behaves in a nice way, and it‘s maximum is achieved in a unique point

Two functions Ln and L are getting closer, the points of each maximum should also get closer which exactly means that 0

ˆ

Page 12: Vorlesung  Generalized Linear  Regressionmodels

12VO, Wien, 2014-05-15

MLE – Definition: Asymptotic Properties

Asymptotic normality:

The estimator not only converges to the unknown parameter, but it converges fast enough.

In MLE theory the asymptotic variance of the estimator is determined by the information or the Fischer-Matrix

For binary data

For grouped data

Page 13: Vorlesung  Generalized Linear  Regressionmodels

13VO, Wien, 2014-05-15

MLE – Definition: Existence of maximum Likelihood Estimates

For a finite sample size it may happen, that ML estimators do not exist.

ML-Estimates do not exist, when you have a data set with complete separation

ML-Estimates may not exist, if you have a data set with quasi-complete separation.

ML-Estimates do exist, when you have a data set with overlap.

ML- Estimates do exist, when you have a data set with linear dependency.

ML-Estimates exist, if there is no hyper plane that separates the 0 and the 1 responses.

Page 14: Vorlesung  Generalized Linear  Regressionmodels

14VO, Wien, 2014-05-15

MLE – Definition: Estimation conditioned on Predictor Values

Sometimes samples can be conditional on the response y.

In such stratified samples one observes x values given at y=1 and x values given at y=0.

A common case is case-control studies in biomedicine., where y = 1(cases) and y=0 (controls)

(choice-based sampling in econometrics)

Let us consider the most simple case of binary predictor with y={0,1} and x={0,1}

with

is the odds ratio, which contains the association between y and x

Parameter of association is the same e estimate coefficient β of the original logit model

Page 15: Vorlesung  Generalized Linear  Regressionmodels

15VO, Wien, 2014-05-15

This way might be motivated by the specific structure of the logit model.

We go back to chapter 2.2.2 Derivation of the binary logit model to assume that perdictors are normally distributed.

denoting the density given y=r

denoting the marginal probability

From the Bayer‘s theorem, follows:

Therefore or

holds.

This shows that a logit model holds if has a linear form and contains

and only the intercept depends on the marginal probabilities.

The important point is, that the marginals determine only the intercept!

MLE – Definition: Estimation conditioned on Predictor Values

Page 16: Vorlesung  Generalized Linear  Regressionmodels

16VO, Wien, 2014-05-15

The likelihood for a given y differs from the likelihood given predictors.

By using one obtains for the log-likelihood conditional on y

Equivalent to the conditional log-likelihood

Marginal distribution of x(can be maximized by empirical distribution)

Marginal distribution of y (fixed by the sampling)

MLE – Definition: Estimation conditioned on Predictor Values

Page 17: Vorlesung  Generalized Linear  Regressionmodels

17VO, Wien, 2014-05-15

general binary model:

link function and linear predictor

Care should be taken when estimating these constituents!

Maximum – Likelihood Estimation

Basic principle is to construct the likelihood of the unknown parameters for the sample data!

MLE can cope with difficult and complicated linear predictors (interactions, dummy variables etc.)

iterative algorithm

Properties of MLE

It is consistent and asymptotically normally distributed.

It is asymptotic efficient. (Fischer-Matrix)

Maximum-Likelihood estimators might not exist. They do exist when the data set has overlap or linear dependency.

Depending on the data set, ML can also be conditional on the response y.

Summary

Page 18: Vorlesung  Generalized Linear  Regressionmodels

18VO, Wien, 2014-05-15

Thank you for your attention!

Page 19: Vorlesung  Generalized Linear  Regressionmodels

19VO, Wien, 2014-05-15

Man beachte den feinen Unterschied: für die Wahrscheinlichkeitsfunktion interessierten wir uns, weil sie uns die Eintrittswahrscheinlichkeiten von Realisationen für gegebene Parameter θ angibt. Bei der Likelihoodfunktion nehmen wir die Stichprobe als gegeben an und interessieren uns für den unbekannten Parameter θ, der dieRealisation der gegebenen Stichprobe ‘am wahrscheinlichsten’ macht!

Page 20: Vorlesung  Generalized Linear  Regressionmodels

20VO, Wien, 2014-05-15

Page 21: Vorlesung  Generalized Linear  Regressionmodels

21VO, Wien, 2014-05-15

Page 22: Vorlesung  Generalized Linear  Regressionmodels

22VO, Wien, 2014-05-15

MLE - Example

Page 23: Vorlesung  Generalized Linear  Regressionmodels

23VO, Wien, 2014-05-15

You can now calculate the probability of Bryant scoring the amounts he actually scored.

Basic principle of MLE!!!

to construct the likelihood of the unknown parameters for the sample data

Let f(ε) denote the density function for ε. (Recall that the density function is like a probability function, and that the density for a normal variable is a bell curve with its maximum at ε=0.)

Given the prediction M and the density function, you can compute the probability of Bryant scoring any particular point total Y. This is given by the formula f(Y-M) = f(ε).

- For example, if you believe that M=32, then the probability that Bryant scores 35 is given by f(35-32) = f(3). - If σ=6, for example, then examination of the normal table reveals f(3) = 08

Assume that Bryant’s scoring in one game is independent of what he scored in the prior game. - Recall that the probability of two independent events occurring is just the product of the probability that each occurs. - It follows that the probability, or likelihood, of Bryant scoring exactly 33, 22, 25, 40, and 30 points is

just the product of the probabilities of his getting each of these scores.

Given any prediction M, you can write the likelihood score as: Likelihood score = L = f(33-M) · f(22-M) · f(25-M) · f(40-M) · f(30-M).

MLE - Example

Page 24: Vorlesung  Generalized Linear  Regressionmodels

24VO, Wien, 2014-05-15

You want to find “maximum likelihood estimator” (MLE) of M!-This is the value of M that maximizes L

- Intuitively, you know that the MLE of M would not be 15 or 50 or some number far from his typical scoring output. It is almost impossible that a player who is predicted to score 15 points per game would actually score 33, 22, 25 40, and 30. -In fact, if M = 15 and σ= 6, then -L= f(33-15) · f(22-15) · f(25-15) · f(40-15) · f(30-15) = f(18) · f(7) · f(10) · f(25) · f(15) < .0000001

But 32 might be a good candidate to be the MLE. Someone predicted to score 32 points per game has a reasonable chance of scoring 33, 22, 25, 40, and 30.

- In this case, L= f(1) · f(-10) · f(-7) · f(8) · f(-2) ≈.00005- It turns out that MLE estimate of M is given by the mean of the realized values of Y. That is, M = 30 and L= .00014

MLE - Example

Page 25: Vorlesung  Generalized Linear  Regressionmodels

25VO, Wien, 2014-05-15

Page 26: Vorlesung  Generalized Linear  Regressionmodels

26VO, Wien, 2014-05-15