an introduction to discrete choice modelling

45
AN INTRODUCTION TO DISCRETE CHOICE MODELLING Tony Fowkes Visiting Reader Institute for Transport Studies University of Leeds Internal Seminar, ITS, 07/04/16

Upload: institute-for-transport-studies-its

Post on 14-Feb-2017

331 views

Category:

Economy & Finance


1 download

TRANSCRIPT

Page 1: An Introduction to Discrete Choice Modelling

AN INTRODUCTION TO DISCRETE CHOICE MODELLING

Tony FowkesVisiting Reader

Institute for Transport StudiesUniversity of Leeds

Internal Seminar, ITS, 07/04/16

Page 2: An Introduction to Discrete Choice Modelling

WHAT DO YOU THINK OF BRITISH TV?

• How good are the BBC channels? – Think of a number!

Page 3: An Introduction to Discrete Choice Modelling

WHAT DO YOU THINK OF BRITISH TV?

• How good are the BBC channels? – Think of a number!

Specifically, how ‘satisfied’ are you the BBC channels (BBC1, BBC2 & BBC4)?

We will be dealing with comparisons, so any number will do for now. Write down 100 if you can think of nothing better.

Page 4: An Introduction to Discrete Choice Modelling

WHAT DO YOU THINK OF BRITISH TV?

• Relative to the number you gave for the BBC channels, how good do you think the ITV offering is (ITV1 – ITV4)?

If you think one is twice as good as another, you might give it twice the number.

Be guided by how often you watch ITV channels as against BBC channels.

Page 5: An Introduction to Discrete Choice Modelling

WHAT DO YOU THINK OF BRITISH TV?

• Now give me a third number for how good you think all the other channels are.

Page 6: An Introduction to Discrete Choice Modelling

WHAT DO YOU THINK OF BRITISH TV?

• Lastly, taking the total time you spend watching all channels in a typical week as 100%, please write down the 3 percentages of time you typically spend watching each of the channel groups.

You do not need to be too exact, and if you don’t watch TV in a typical week, choose a non-typical one.

Page 7: An Introduction to Discrete Choice Modelling

So, we have been able to measure shares (also known as proportions, probabilities and, if multiplied by 100, percentages).

But we want to model the shares, so that we understand how they vary from one person to another, and over time as things change. That will allow us to make predictions.

Page 8: An Introduction to Discrete Choice Modelling

HOW MIGHT WE RELATE THE VIEWING % FIGURES TO THE SATISFACTION NUMBERS?

• Each person will have used a different, (and unknown to the analyst) scale when selecting their satisfaction numbers, but we might try to guess (FOR EACH PERSON) the proportion of time they spend watching each of the 3 groups of channels.

Page 9: An Introduction to Discrete Choice Modelling

A SHARE MODEL

The simplest way of looking at this problem is to try to form a simple ‘share model’.Let Hi denote the hours spent watching channel i, Si satisfaction with channel i and Pi denote the share of the hours watched for channel i in the total. Then:

PBBC = HBBC/(HBBC+HITV+HELSE)

Page 10: An Introduction to Discrete Choice Modelling

A SHARE MODEL

If hours watched are proportional to Satisfaction, then:

PBBC = SBBC/(SBBC+SITV+SELSE)

BUT – is Usage always proportional to Satisfaction?

Page 11: An Introduction to Discrete Choice Modelling

CONSIDER YOUR JOURNEY HOME FROM THE UNIVERSITY

• If you had the choice of two alternative routes, one of which is three times as good as the other, would you ever willingly choose the worse route?

• P1 = S1/(S1+S2) = 100/(100+300) = 0.25

Seems like we need a better share model.

Page 12: An Introduction to Discrete Choice Modelling

TRY USING EXPONENTIALS

P1 = Exp(S1)/[Exp(S1)+Exp(S2)] = 2.47/1081

Rather too extreme, but we can define a Utility (U) as a function of the S values,

eg. U = θSLet θ = 0.05 (just to try it)P1 = Exp(5)/[Exp(5)+Exp(10)] = 0.03By changing θ we can get sensible Ps

Page 13: An Introduction to Discrete Choice Modelling

BACK TO THE TV EXAMPLE

If you had given S1=100, S2=80, S3=160; then with θ=0.01 (just as an example),

PBBC = Exp(1)/[Exp(1)+Exp(0.8)+Exp(1.6)] = 0.27

PITV = 0.22PELSE = 0.50

Page 14: An Introduction to Discrete Choice Modelling

THE SCALE FACTOR

We call θ the SCALE FACTOR, and it is a crucial parameter that has to be estimated when calibrating a Discrete Choice forecasting model.

The scale factor determines the relative weight we give to the deterministic part of the model compared to everything else (the unknown residual or ‘error’ term).

Page 15: An Introduction to Discrete Choice Modelling

The Scale Factor ProblemLogit Models consist of 2 parts:

U = Deterministic part + Random error

U = ΩV + ε

where the Ω ‘scales’ the expression we use for V to the scale of the random error.

Suppose V = β0 + β1X1 + β2X2

Then ΩV = Ωβ0 + Ωβ1X1 + Ωβ2X2

And so the modelled coefficients are estimates of Ωβ0, Ωβ1, Ωβ2

Page 16: An Introduction to Discrete Choice Modelling

Why does the scale factor problem matter?

• For attribute valuation, such as ‘value of time’, it doesn’t matter since the scale factors cancel

• For mode choice forecasting it does matter, unless the errors are the correct size. This may well be the case for RP, but will not be the case for SP, where the errors are likely to be greater than real errors due to the hypothetical nature of the experiment. That will mean that the formula for P will overstate small probabilities and understate the probability of the dominant mode.

Page 17: An Introduction to Discrete Choice Modelling

Probability P varies with Ω

P = exp(ΩV)/∑kexp(ΩVk)

As Ω → 0, P → 1/kie. complete ignorance – toss of a coin.

As Ω increases, the more the model is explaining what is going on – good.

Page 18: An Introduction to Discrete Choice Modelling

How can the Binary Logit model be derived?

P1 = Prob(U1 > U2) = Prob(ΩV1+ε1 > ΩV2+ε2)

= Prob(ε2 = h AND ε1 ≥ h + ΩV2 - ΩV1)Assume a Gumbel distribution for the ε’s.Cumulative F(ε) = exp(-exp(-ε))Density fn. dF(ε) = exp(-ε) exp(-exp(-ε)) dε

P1 = ∫ from minus infinity to plus infinity of dF(ε2)F(ε1) which on substitution gives

exp(-h)exp(-exp(-h).exp(-exp(- h + ΩV2 -ΩV1)) dh

Page 19: An Introduction to Discrete Choice Modelling

which, after some tricky but conventional manipulation gives:

P1 = 1/(1+exp(ΩV2-ΩV1)

Or

P1 = (exp(ΩV1))/[exp(ΩV1) + exp(ΩV2)]

which is the Binary Logit model.

Page 20: An Introduction to Discrete Choice Modelling

Multinomial Logit Model (MNL)

• This brings us back to where we started, a three way choice of TV channels. For more than 2 choices we use a Multinomial Logit model

P1 = exp(U1)/(exp(U1) + exp(U2) + …)

Page 21: An Introduction to Discrete Choice Modelling

Problem with the MNL model

• A theoretical, and sometimes important problem with MNL is the Red Bus – Blue Bus problem, which arises from the Independence of Irrelevant Alternatives property.

• This can be avoided by using various Nested Logits, Mixed Logit, Cascetta’s C-Logit, or Fowkes & Toner’s Flat Logit.

Page 22: An Introduction to Discrete Choice Modelling

THE DETERMINISTIC PART

Here we seek to model Utility.

The current terminology we use is to regard the 3 channel groups as 3 ALTERNATIVES, each described by a set of ATTRIBUTES, each set to a particular LEVEL.

Page 23: An Introduction to Discrete Choice Modelling

Examples of ALTERNATIVES, ATTRIBUTES and ATTRIBUTE LEVELS

Our Alternatives are BBC, ITV, ELSEImportant ATTRIBUTES might be:(i) Availability(ii) Cost(iii) Variety of programmes(iv) Quality of programmes

Page 24: An Introduction to Discrete Choice Modelling

Possible attribute LEVELS for Availability might be:

a) Freeviewb) Satellitec) High Definitiond) On Demand

Page 25: An Introduction to Discrete Choice Modelling

Possible attribute LEVELS for Variety might be:

(a) Very good choice(b) Good choice(c) Average(d) Poor range of programmes(e) Very limited range of programmes(f) Only phone-in shows

Page 26: An Introduction to Discrete Choice Modelling

Possible attribute LEVELS for Quality might be:

(a) International top quality(b) Not bad for a national network(c) Has occasional good programmes(d) Only repeats(e) Only phone-in shows(f) Ant ‘n’ Dec

Page 27: An Introduction to Discrete Choice Modelling

Transport ApplicationsIn Transport there are many occasions where we model Alternatives by their Generalised Cost, GC:

eg. GC = αC + βT

Or, more generally,

GC = αC + β1T1 + β2T2... + βnTn

Page 28: An Introduction to Discrete Choice Modelling

Excerpt from A Gray (1977)

“For the UK, the generalised cost concept was perhaps invented by Quarmby in the famous 1967 article about modal choice, based on some earlier work by Warner (1962) in the United States. In Quarmby’s article the concept was described as ‘disutility’ and referred to a linear combination of the time and money costs of a journey”.

Page 29: An Introduction to Discrete Choice Modelling

VALUE OF TIME

In passing we note that the RATIO OF the coefficient of the nth type of time (Tn) TO the coefficient of cost is called the value of the nth type of time, ie

VOT(n) = βn /α

This has kept some of us employed for a good part of our working lives.

Page 30: An Introduction to Discrete Choice Modelling

WHAT IS THE VALUE OF TIME?It is just the exchange rate (for a person, a

sample, or a population) between money and spending extra time in an activity. It has 2 parts.

There is always something we can do with time so the Resource VOT is always +ve.

Usually more important is the (dis)utility of the activity concerned. Most activities have a –ve utility from time reduction, but in transport they are mostly +ve.

Page 31: An Introduction to Discrete Choice Modelling

Binary Choice

Let us estimate a model for 2 Alternatives: 1 & 2 (just 2, so we say “Binary”)

Suppose the Alternatives only differ in terms of measured Generalised Cost.

We need to observe P1, the proportion choosing Alternative 1 for various levels of difference in GC between the Alternatives.

Page 32: An Introduction to Discrete Choice Modelling

The Binary Logit Model

A Linear expression for P1 is not satisfactory. (eg. P1 has to lie between zero and one).• A linear expression for ln(P1/(1-P1))seems much more satisfactory

Put this “logit” (or ‘log-odds’) equal to difference in Generalised Cost, GC1-GC2

Page 33: An Introduction to Discrete Choice Modelling

Equation for the Binary Logit Model

Ln(P1/(1-P1)) = GC1-GC2

P1/(1-P1) = exp(GC1-GC2)P1 = exp(GC1-GC2) - P1.exp(GC1-GC2)P1(1+exp(GC1-GC2)) = exp(GC1-GC2)P1 = exp(GC1-GC2)/(1+exp(GC1-GC2))

P1 = exp(GC1)/[(exp(GC1)+exp(GC2)]

Page 34: An Introduction to Discrete Choice Modelling

Excerpt from D McFadden (2001)

“In 1965, a graduate student asked me how she might analyze her thesis data in freeway routing choices by the California Department of Highways. This led me to consider the problem of economic choice among discrete alternatives. The problem was to devise a computationally tractable model of economic decision making that yielded choice probabilities for each alternative in a finite feasible set. It was natural to think of highway department decision-makers as maximizing preferences that varied from one bureaucrat to another.

Page 35: An Introduction to Discrete Choice Modelling

“I drew on a classical psychological study of perception, Thurstone’s Law of comparative Judgment. In this theory, the perceived level of a stimulus equals its objective level plus a random error. The probability that one object is judged higher than a second is the probability that this alternative has the higher perceived stimulus. When the perceived stimuli are interpreted as levels of satisfaction, or utility, this can be interpreted as a model for economic choice in which utility levels are random, and observed choices pick out the alternative that has the highest realized utility level. This connection was made in the 1950’s by the economist Jacob Marschak, who called this the random utility maximization hypothesis, abbreviated to RUM.

Page 36: An Introduction to Discrete Choice Modelling

“Another psychologist I relied on was Duncan Luce, who in 1959 introduced an axiom that simplified experimental collection of psychological choice data by allowing choice probabilities for many alternatives to be inferred from choices between pairs of alternatives. Marschak showed that choice probabilities satisfying Luce’s axiom were consistent with the RUM hypothesis.I proposed an econometric version of the Luce model in which the utilities of alternatives depended on their measured attributes, such as construction cost, route length, and areas of parklands and open space taken. I called this a conditional or multinomial logit model, and developed a computer program to estimate it.”

Page 37: An Introduction to Discrete Choice Modelling

DALY-ZACHARY-WILLIAMS THEOREM

Andrew Daly & Stan Zachary (1976) and Huw Williams (1977) added significantly to Discrete Choice theory, particularly providing a set of conditions that Generalised Extreme Value models need to meet in order to be a probability choice model.

Williams also related the concept of Consumer Surplus to Discrete Choice Model parameters.

Page 38: An Introduction to Discrete Choice Modelling

Revealed Preference AnalysisKey References

1. P Samuelson (1938). Econometrica.Observing a consumer to have chosen one alternative and, by so doing, have rejected a second alternative.

2. K Lancaster (1966). Journal of Political Economy.Utility for a commodity determined by the characteristics of that commodity. Then a small step to modelling utility as a sum of ‘part-worths’ of these characteristics individually.

3. D McFadden (1974). In: Zarembka (ed), Frontiers of Econometrics.‘Conditional Logit Analysis of Qualitative Choice Behaviour’

Page 39: An Introduction to Discrete Choice Modelling

Revealed Preference DataTRAVELLERS ARE OBSERVED TO CHOOSE AN OPTION (HAVING CERTAIN CHARACTERISTICS) IN PREFERENCE TO ANOTHER OPTION (HAVING OTHER CHARACTERISITCS)e.g. Traveller chooses train with cost £30 and travel time 2 hours in preference to coach costing £15 and taking 4 hours.EITHER Requires ‘Engineering’ data on costs, times,

etc. (Possibly from fare manuals, timetablesor modelled)

OR Requires traveller to report the costs and times of both the chosen and rejected modes.

Page 40: An Introduction to Discrete Choice Modelling

– Self justification bias in reported data– Many choices ‘dominated’– Cost and time differences between modes may

be correlated– Habit/inertia effects– Respondent may not be able to give satisfactory

data about the alternative mode

Generally need very large samples

Problems with Revealed Preference Data

Page 41: An Introduction to Discrete Choice Modelling

Transfer Price Data

TRAVELLERS ARE ASKED DIRECTLY FOR A MEASURE OF UTILITY DIFFERENCE BETWEEN TWO TRAVEL ALTERNATIVES

by questions such as:

‘How much would the cost of your chosen alternative have to rise in order for you to switch to your rejected alternative?

Page 42: An Introduction to Discrete Choice Modelling

Problems with Transfer Price Data

– Policy response bias– Unconstrained response bias– Self justification bias– Requires data about the rejected alternative,

which may only be known very inexactly– Respondent may not understand or be able to

relate to question

Page 43: An Introduction to Discrete Choice Modelling

Stated Preference DataTRAVELLERS ARE PRESENTED WITH A SET OF HYPOTHETICAL TRAVEL CHOICES, EACH WITH ITS OWN CHARACTERISTICS (e.g. Cost, Travel time, etc), AND ASKED TO

- MAKE A CHOICE - RANK ALTERNATIVES - RATE ALTERNATIVES

THE CRUCIAL REQUIREMENT IS THAT THE ABOVE INCORPORATE IMPLICIT TRADE-OFFS

Page 44: An Introduction to Discrete Choice Modelling

Advantages of Stated Preference

– Can represent situations that do not yet exist– No problem of reporting error/bias– Can ‘design in’ interesting trade offs– Can ensure low correlation between

characteristic differences– Can ask ‘many’ choices of each individual– Avoids requirement for ‘confidential’ information

Page 45: An Introduction to Discrete Choice Modelling

Problems with Stated Preference Data

– Response not rooted in an actual choice– Questions may be difficult to understand– Respondents may refuse to ‘play games’– Relatively unimportant characteristics may be

ignored– Design is (very?) difficult– Scale factor problem