choice models - personal.psu.edusegmentation in choice models using latent class analysis basic...

1

© Arvind Rangaswamy (2017) All Rights Reserved

January 31. 2017

Choice Models

Arvind Rangaswamy [email protected]

www.arvind.info

mailto:[email protected]

2


Topics for Today

Discussion of Guadagni and Little (1983) paper

Discussion of the logit choice model, with emphasis on estimation

Brief discussion of other papers

3


The Logit Model (Popularized in Marketing by Guadagni and Little 1983)

The objective of the model is to predict the probabilities that an individual will choose each of several alternatives. Instead of having two stages, the first to model utility, and the second to translate utility into choice probabilities, we directly model choice probabilities in one stage.

The probabilities lie between 0 and 1, and sum to 1 across the

choice alternatives.

The Logit model is consistent with the proposition that

customers pick the choice alternative that offers them the

highest utility on a purchase occasion, but the utility has a

random component that varies from one purchase occasion

to the next.

This model is subject to the IIA property.

4


The (Conditional Multinomial) Logit Model

On each choice occasion, the (unobserved) utility that customer i gets from

alternative k is given by:

… (1)

Where is the random component of the customer’s utility. One option is to

assume that is distributed independent Gumbel (i.e., type 1 extreme value).

Utility (𝑼𝒌𝒊 ) is the sum of an observable term (𝑽𝒌

𝒊 = 𝜷𝒋𝒌𝑿𝒋𝒌𝒊

𝒋 ) and unobservable term

(𝒌𝒊 ), making 𝑼𝒌

𝒊 unobservable or latent.

𝑽𝒌𝒊 is the intrinsic value or “attractiveness” (view it as inferred preference or utility

value) of alternative k to customer i. 𝑿𝒋𝒌𝒊 is the observed or measured value of

variable j (a characteristic of customer i such as age, or choice alternative k) when

customer i makes a choice/purchase.

jk is the importance weight associated with variable j for attribute k. If the effect of

a variable is common to all alternatives (e.g., age of customer), then we can use the

notation j .

ik

ik

ik VU

ik i

k

5


Potential Probability Models for

Normal

Extreme value

Gompertz

6


Types of Logit Models

Binary Logit model (Logistic model)

Ordinal Logit model (Choices ordered)

McFadden Conditional Logit model (Choices not ordered,

differences in characteristics of alternatives influence choices)

Nested Logit model

Mixed Logit model (with random coefficients)

7


Mathematical Specification of

the Conditional Logit Model

Customer i chooses the product which offers the highest utility, i.e., probability of choosing alternative k is:

Then if are distributed extreme value, individual i’s probability of choosing brand 1 or choice alternative 1(𝑷𝟏

𝒊 ) is given by:

… (3)

where C is the choice set. Similar equations can be specified for the probabilities that customer i will choose other alternatives; That is, the logit model is a sequence of equations, not just one equation.

In “Aggregate Logit model,” j is the same for all individuals.

𝑷𝒌𝒊 = 𝑷 𝑼𝒌

𝒊 𝑼𝒎𝒊 ; 𝐟𝐨𝐫 𝐚𝐥𝐥 𝒎 𝐢𝐧 𝐭𝐡𝐞 𝐜𝐡𝐨𝐢𝐜𝐞 𝐬𝐞𝐭 … (2)

𝑷𝟏𝒊 =

𝒆𝑽𝟏𝒊

𝒆𝑽𝒌𝒊

𝑪

ik

8


Example with Four Alternatives and One Independent Variable

𝑷𝟏𝒊 = 𝑷𝒓𝒐𝒃 𝒀𝟏

𝒊 = 𝟏 =𝒆𝜷𝟏𝒙𝟏

𝒊

𝒆𝜷𝟏𝒙𝟏𝒊

+𝒆𝜶𝟐+𝜷𝟏𝒙𝟐𝒊

+𝒆𝜶𝟑+𝜷𝟏𝒙𝟑𝒊

+𝒆𝜶𝟒+𝜷𝟏𝒙𝟒𝒊

𝑷𝟐𝒊 = 𝑷𝒓𝒐𝒃 𝒀𝟐

𝒊 = 𝟏 =𝒆𝜶𝟐+𝜷𝟏𝒙𝟐

𝒊





𝑷𝟑𝒊 = 𝑷𝒓𝒐𝒃 𝒀𝟑

𝒊 = 𝟏 =𝒆𝜶𝟑+𝜷𝟏𝒙𝟑

𝒊





𝑷𝟏𝒊 = 𝑷𝒓𝒐𝒃 𝒀𝟒

𝒊 = 𝟏 =𝒆𝜶𝟒+𝜷𝟏𝒙𝟒

𝒊





We can think of the Logit model for this application as generating one

coefficient estimate (1), three alternative-specific constants (i’s) and

four equations (Note 1 is set to 0. Why?):

𝒀𝒌𝒊 =

𝟏 𝐢𝐟 𝐜𝐨𝐧𝐬𝐮𝐦𝐞𝐫 𝒊 𝐜𝐡𝐨𝐨𝐬𝐞𝐬 𝐚𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝒌 𝟎 𝐢𝐟 𝐜𝐨𝐧𝐬𝐮𝐦𝐞𝐫 𝒊 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐜𝐡𝐨𝐨𝐬𝐞 𝐚𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝒌

9


An Important Property of the Logit Model

Probability of Individual i Choosing Alternative k ( ) 0.0 0.5 1.0

Low

High

ikP

The marginal impact is highest when

the customer is “sitting on the

fence,” i.e., when 𝑷𝒌𝒊 = 0.5 (for linear

utility function).

Question: Is this a good property to have?

Marginal impact of

variable j (e.g., price)

associated with

alternative k (𝑷𝒌

𝒊

𝑿𝒋𝒌𝒊 )

𝑃𝑘𝑖

𝑋𝑗𝑘𝑖 = 𝛽𝑗𝑘𝑃𝑘

𝑖(1 − 𝑃𝑘𝑖)

10


Another Implication: Cross Elasticity

The value on the right is the same for any alternative k.

Question: Is this a good property to have?

Cross elasticity is the percent change in the

probability of choice alternative k when an

observed variable j relating to another alternative

h changes:

𝑃𝑘𝑖

𝑋𝑗ℎ𝑖.𝑋𝑗ℎ

𝑖

𝑃𝑘𝑖= −𝛽𝑗ℎ𝑋𝑗ℎ

𝑖 𝑃ℎ𝑖

...(4)

11


Maximum Likelihood Estimation (MLE) of Logit Parameters

Then 𝑷(𝒀𝒌𝒊 = 𝟏) is the probability that 𝑼𝒌

𝒊 𝑼𝒎𝒊 for all k m in

the choice set. That is, 𝑷 𝑼𝒌𝒊 𝑼𝒎

𝒊 .

Now consider the likelihood of any random sample of N observations (individuals). This likelihood is the product of the likelihoods of the individual observations (Why?):

C is the choice set. For notational simplicity, I have dropped subscript k from .

Substitute for 𝑷 𝒀𝒌𝒊 = 𝟏 𝐟rom (4).

N

i Ck

YikJ

ikYPL

121 )1()...,,,(

𝒀𝒌𝒊 =

𝟏 𝐢𝐟 𝐜𝐨𝐧𝐬𝐮𝐦𝐞𝐫 𝒊 𝐜𝐡𝐨𝐨𝐬𝐞𝐬 𝐚𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝒌 𝟎 𝐢𝐟 𝐜𝐨𝐧𝐬𝐮𝐦𝐞𝐫 𝒊 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐜𝐡𝐨𝐨𝐬𝐞 𝐚𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝒌

12


Estimating

N

i Ck

ijk

ik

ik

j

Ck

Xijk

jj

N

i Ck

ik

Y

N

i Ck

k

X

X

, ...J, for jXPyLLn

eLnXyLLn

e

eL

j

ijkj

ik

j

ijkj

j

ijkj

1

1

1

21 )()(

)()(

13


Estimating ’s

MLE maximizes the likelihood of obtaining the realized sample, as a function of model parameters.

To maximize Ln(L), set 𝑳𝒏(𝑳)

𝜷𝒋= 𝟎 for j = 1,2, …J. This

gives J equations in J unknowns. If a solution exists, it can shown to be unique under fairly general conditions.

The MLE Estimator of is consistent, asymptotically Normal, and asymptotically efficient.

’s can be interpreted akin to regression coefficients.

14


What’s the Big Deal? (McFadden won a Nobel prize for this line of work)

This is a theoretically defensible model to characterize discrete choices based on a random utility framework (people’s preferences are not directly observable – at least to the modeler – and can vary from situation to situation).

Developed MLE (Maximum Likelihood Estimation) for estimating the coefficients of the model. MLE maximizes the probabilities of the actual choices occurring given the estimated parameters.

Results in well-established statistical tests for determining the adequacy of the estimated model.

Has seen an exploding number of applications in different fields.

Discussion of

Guadagni and Little (1983)

16


Overview of Data and Model

Ground coffee purchase and store contexts from store and panel data. Four Kansas City supermarkets for 78 weeks.

5 brands 2 sizes (small and large); eliminate 2 brand-sizes with very small market share (8 brands in total).

Variables – those with brand-specific effects (brand constants) and those with effects common to all alternatives.

Accounts for customer heterogeneity using loyalty variables.

Models

Fit measure

𝑿𝟎𝒌𝒊

Effects

on utility

specific

to brands

𝑿𝟎𝟔𝒊

𝑿𝟎𝟕𝒊

𝑿𝟎𝟐𝒊

𝑿𝟎𝟑𝒊

𝑿𝟎𝟏𝒊

𝑿𝟎𝟒𝒊

𝑿𝟎𝟓𝒊

Effects

on utility

common

to all

brands

Null model

18


A Closer Look at the Loyalty Variable (Accounts for Heterogeneity)

b is a “carryover” constant, expected to be about 0.75.

is normalized to sum to 1 across brands.

A considerable amount of later research has scrutinized the loyalty variable,

and proposed alternatives. In an extensive simulation conducted by

Abramson et al. (JMR 2000), a model with the Guadagni and Little loyalty

specification, which allows for choice set effects (i.e., consideration sets),

performed the best. It is possible that the loyalty effects will be overstated

when consumer heterogeneity is ignored, although simulations conducted

by Abramson et al. show that underspecified (discrete) heterogeneity

induces bias primarily in preference coefficients (the alternative-specific

constants in the model).

Xi

k6

𝑿𝟔𝒌𝒊 = 𝜶𝒃𝑿𝟔𝒌

𝒊 (𝒏 − 𝟏) +

(𝟏 − 𝜶𝒃) 𝟏 𝐢𝐟 𝐜𝐨𝐧𝐬𝐮𝐦𝐞𝐫 𝒊 𝐛𝐨𝐮𝐠𝐡𝐭 𝐚𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞 𝒌 𝐚𝐭 𝐩𝐮𝐫𝐜𝐡𝐚𝐬𝐞 𝐨𝐜𝐜𝐚𝐬𝐢𝐨𝐧 (𝒏 − 𝟏) 𝟎 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞

19


Predictive Validation of Models

20


Three Short-term Market Response Simulations

21


Segmentation in Choice Models Using Latent Class Analysis

Basic Idea: The population of customers consists of several segments, and the values of the variables of interest (e.g., Gender, Past purchases) are imperfect indicators of the segment to which a customer belongs.

Operationally, this means that the weights (j’s) in the Logit model differ across segments, but the segments are unknown (latent) and have to be inferred from the data.

(𝑷𝒌𝒊 |𝒊 𝒔𝒆𝒈𝒎𝒆𝒏𝒕 𝒔) =

𝒆 𝜷𝒋𝒔𝑿𝒊𝒋𝒌𝒋

𝒆 𝜷𝒋𝒔𝑿𝒊𝒋𝒌𝒋𝒌

22


Other Ways to Model Heterogeneity in Logit Models

Allow for the Logit parameters to be distributed over the population according to a known (i.e., specifiable, but with unknown parameters) distribution, and estimate the parameters of the distribution through MLE.

Incorporate observable characteristics of individuals, i.e., demographics, in the specification of the preference function ( ) -- does not work too well.

Hierarchical Bayes estimation (A topic that will require a separate class session).

ikV

23


Some Current Topics in This Area

Estimating differentiated product demand

systems with aggregate data (e.g., BLP model).

Dynamic choice models that account for time-

based preferences.

Simulated maximum likelihood and Bayesian

estimation.

24


Modeling Consideration

Roberts and Lattin model (1992) Two-Stage

model: (1) Consideration, and (2) Choice. Cost is

used as a threshold for being in or out of

consideration.

Wu and Rangaswamy (2003): Uses two-stage

model, but based on “fuzziness” in which all

alternatives are considered to some degree.

Incorporates decision process in understanding

consideration.

25


Current Work on Consideration (with Daniel Ringel, Bernd Skiera, and Yifan Zhang)

Exploring factors that expand or maintain

consideration sets during decision process.

Detailed online data on consumer browsing

behaviors.

choice models - personal.psu.edusegmentation in choice models using latent class analysis basic...

Documents