predicting preference reversals via gaussian process uncertainty aversion

References

Predicting Preference Reversals

via Gaussian Process Uncertainty Aversion

Rikiya Takahashi1 Tetsuro Morimura2

1SmartNews, [email protected]

2IBM Research - [email protected]

May 10, 2015

AISTATS 2015 Predicting Preference Reversals via Gaussian Process Uncertainty Aversion

References

Discrete Choice Modelling

Goal: predict prob. of choosing an option from a choice set.

Why solving this problem?

For business: brand positioning among competitors

For business: sales promotion (yet involving some abuse)

To deeply understand how human makes decisionsAISTATS 2015 Predicting Preference Reversals via Gaussian Process Uncertainty Aversion

References

Random Utility Theory

Each human is a maximizer of random utility.

i ’s choice from Si = arg maxj∈Si

fi (vj )︸︷︷︸mean utility

+ εij︸︷︷︸random noise

Si : choice set for i , vj : vector of j ’s attributes, fi : i ’smean utility function

Assuming independence among every option’s attractiveness

For both mean and noise: (e.g., logit (McFadden, 1980))For only mean: (e.g., nested logit (Williams, 1977))


References

Why Random Utility Theory has been Used?

Voices from friends (machine learners & econometricians)

1 Rationality of independence assumption

Attributes of unchosen options are irrelevant to thechosen option’s benefit.I bought diamond. This is the best. It’s ridiculous tothink that other dirty stones affected my final choice.

2 Computational practicality

Unless scoring each option, how to decide the best one?Formalizing data likelihood is straight and easy.


References

Complexity of Real Human’s Choice

An example of choosing PC (Kivetz et al., 2004)

Each subject chooses 1 option from a choice set

A B C D ECPU [MHz] 250 300 350 400 450Mem. [MB] 192 160 128 96 64

Choice Set #subjects{A, B, C} 36:176:144{B, C, D} 56:177:115{C, D, E} 94:181:109

Can random utility theory still explain the preference reversals?

B�C or C�B?


References

Agenda

1 Introduction of the Goal and Issues2 Irrational Context Effects

Similarity EffectAttraction EffectCompromise EffectPrior Work

3 Proposing a Bayesian Model of Mental Conflict

4 Numerical Studies

5 Conclusion


References

Similarity Effect (Tversky, 1972)

Top-share choice can change due to correlated utilities.

E.g., one color from {Blue, Red} or {Violet, Blue, Red}?


References

Attraction Effect (Huber et al., 1982)

Introduction of an absolutely-inferior option A− (=decoy)causes irregular increase of option A’s attractiveness.

Despite the natural guess that decoy never affects the choice.

If D�A, then D�A�A−.

If A�D, then A is superior to both A− and D.


References

Compromise Effect (Simonson, 1989)

Moderate options within each chosen set are preferred.

Different from non-linear utility function involvingdiminishing returns (e.g.,

√inexpensiveness+

√quality).


References

Positioning of Our Work in LiteratureSim.: similarity, Attr.: attraction, Com.: compromise

Sim. Attr. Com. Mechanism Predict. for LikelihoodTest Set Maximization

SPM OK NG NG correlation OK MCMC

MDFT OK OK OK dominance & indifference OK MCMC

PD OK OK OK nonlinear pairwise comparison OK MCMC

MMLM OK NG OK none OK Non-convex

NLM OK NG NG hierarchy NG Non-convex

BSY OK OK OK Bayesian OK MCMC

LCA OK OK OK loss aversion OK MCMC

MLBA OK OK OK nonlinear accumulation OK Non-convex

Proposed OK NG OK Bayesian OK Convex

MDFT: Multialternative Decision Field Theory (Roe et al., 2001)PD: Proportional Difference Model (Gonzalez-Vallejo, 2002)MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000)SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009)NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001)BSY: Bayesian Model of (Shenoy and Yu, 2013)LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004)MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014)


References

Agenda

1 Introduction of the Goal and Issues

2 Irrational Context Effects3 Proposing a Bayesian Model of Mental Conflict

Utility Estimation as Dual PersonalityIrrationality by Bayesian ShrinkageConvex Optimization when using Posterior Mean

4 Numerical Studies

5 Conclusion


References

Utility Estimation as Dual Personality

How about regarding utilities as samples in statistics?

Assumption 1: Utility function is partially disclosed to DMS.1 UC computes the sample value of every option’s utility,

and sends only these samples to DMS.2 DMS statistically estimates the utility function.


References

Mental Conflict as Bayesian Shrinkage

Assumption 2: DMS does Bayesian shrinkage estimation.i ∈{1, . . . , n}: context, yi ∈{1, . . . ,m[i ]}: final choiceXi , (xi1∈RdX , . . . , xim[i ])

>: features of m[i ] options

Objective Data: values of random utilities

vi ,(vi1, . . . , vim[i ])>∼N

(µi , σ

2Im[i ]

), vij = b+w>φφ (xij )

µi : Rm[i ]: vec. of the true mean utility, σ2: noise levelb: bias term, φ : RdX →Rdφ : mapping function. wφ: vec. of coefficients

Subjective Prior: choice-set-dependent Gaussian process

µi ∼ N(0m[i ], σ

2K(Xi ))

s.t. K(Xi ) = (K (xij , xij ′))∈Rm[i ]×m[i ]

µi ∈Rm[i ]: vec. of random utilities, K(·, ·): similarity between options

Final choice: based on (Posterior mean u∗i + i.i.d. noise) as

u∗i = K(Xi )(Im[i ]+K(Xi )

)−1 (b1m[i ]+Φi wφ

),

yi = arg maxj

(u∗ij + εij ) where ∀j εij ∼ Gumbel .


References

Irrationality by Bayesian Shrinkage

Implication of (1): similarity-dependent discounting

u∗i = K(Xi )(Im[i ]+K(Xi )

)−1︸︷︷︸shrinkage factor

(b1m[i ]+Φi wφ

)︸︷︷︸vec. of utility samples

. (1)

Under RBF kernel K (x, x′) = exp(−γ‖x− x′‖2),an option dissimilar to others involves high uncertainty.

Strongly shrunk into prior mean 0.

Context effects as Bayesian uncertainty aversion

0 0.2 0.4 0.6 0.8

1 1.2 1.4

1 2 3 4

Fin

al E

va

lua

tio

n

X1=(5-X2)

DA- A

{A,D}{A,A

-,D}

0 0.2 0.4 0.6 0.8

1 1.2 1.4

1 2 3 4

Fin

al E

va

lua

tio

n

X1=(5-X2)

DCBA

{A,B,C}{B,C,D}


References

Convex Optimization when using Posterior Mean

Global fitting of the parameters using data (Xi , yi )ni=1

Fix the mapping and similarity functions during updates.

Shrinkage factor Hi ,K(Xi )(Im[i ] + K(Xi ))−1 is constant!

Obtaining a MAP estimate is convex w.r.t. (b,wφ).

maxb,wφ

n∑i=1

`( bHi 1m[i ]+Hi Φi wφ︸︷︷︸Context−specific Hi is multiplied .

, yi )−c

2‖wφ‖2

Exploiting the log-concavity of multinomial logit

`(u∗i , yi ), logexp(u∗iyi

)∑m[i ]j ′=1 exp(u∗ij ′)


References

Agenda

1 Introduction of the Goal and Issues

2 Irrational Context Effects

3 Proposing a Bayesian Model of Mental Conflict

4 Numerical Studies

5 Conclusion


References

Experimental Settings

Evaluates accuracy & log-likelihood for real choice data.

Dataset #1: PC (n=1, 088, dX =2)

Dataset #2: SP (n=972, dX =2)

Subjects are asked of choosing a speaker.

A B C D EPower [Watt] 50 75 100 125 150

Price [USD] 100 130 160 190 220

Choice Set #subjects{A, B, C} 45:135:145{B, C, D} 58:137:111{C, D, E} 95:155: 91

Dataset #3: SM (n=10, 719, dX =23)

SwissMetro dataset (Antonini et al., 2007)Subjects are asked of choosing one transportation, eitherfrom {train, car, SwissMetro} or {train, SwissMetro}.Attribute of option: cost, travel time, headway, seattype, and type of transportation.


References

Cross-Validation Performances

High predictability in addition to the interpretable mechanism.

For SP, successfully detected combination of compromiseeffect & prioritization of power.

1st best for PC & SP.

2nd best for higher-dimensional SM: slightly worse thanhighly expressive nonparametric version of mixedmultinomial logit (McFadden and Train, 2000).

-1.1

-1

-0.9

-0.8

Avera

ge L

og-L

ikelih

ood

Dataset

PC SP SM

LinLogitNpLogit

LinMixNpMixGPUA

0.3

0.4

0.5

0.6

0.7

Cla

ssific

ation A

ccura

cy

Dataset

PC SP SM

LinLogitNpLogit

LinMixNpMixGPUA

2

3

4

100 150 200E

valu

ation

Price [USD]

EDCBA

Obj. Eval.{A,B,C}{B,C,D}{C,D,E}


References

Conclusion

Introduced a simple & interpretable Bayesian choice model.

Bayesian shrinkage involving mental conflict

Irrational choice-set-dependent Gaussian process prior

Uncertain aversion as a cause of context effects

Accurate prediction when absolute preference andcompromise effect are mixed.


References

Future Directions

More active Bayesianism for realistic human models

Integration with other Bayesian discrete choice models(e.g., (Shenoy and Yu, 2013))

Explaining attraction effect

Current limitation: decoy gets high share due tosymmetric similarity to target option.

Extension to time-series decision making models

E.g., emulating how human plays multi-armed bandit(Zhang and Yu, 2013)

Choice-set optimization avoiding irrational context effects

News channel = set of news articles

Diversified item recommendation (Ziegler et al., 2005)

Via linear submodular bandits (Yue and Guestrin, 2011)


References

References I

Antonini, G., Gioia, C., Frejinger, E., and Themans, M. (2007).Swissmetro: description of the data.http://biogeme.epfl.ch/swissmetro/examples.html.

Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N.,and Allenby, G. M. (2009). A probit model with structuredcovariance for similarity effects and source of volumecalculations. http://ssrn.com/abstract=1396232.

Gonzalez-Vallejo, C. (2002). Making trade-offs: A probabilistic andcontext-sensitive model of choice behavior. PsychologicalReview, 109:137–154.

Huber, J., Payne, J. W., and Puto, C. (1982). Addingasymmetrically dominated alternatives: Violations of regularityand the similarity hypothesis. Journal of Consumer Research,9:90–98.


References

References II

Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternativemodels for capturing the compromise effect. Journal ofMarketing Research, 41(3):237–257.

McFadden, D. and Train, K. (2000). Mixed MNL models fordiscrete response. Journal of Applied Econometrics,15:447 –470.

McFadden, D. L. (1980). Econometric models of probabilisticchoice among products. Journal of Business, 53(3):13–29.

Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).Multialternative decision field theory: A dynamic connectionistmodel of decision making. Psychological Review, 108:370–392.

Shenoy, P. and Yu, A. J. (2013). A rational account of contextualeffects in preference choice: What makes for a bargain? InProceedings of the Cognitive Science Society Conference.


References

References III

Simonson, I. (1989). Choice based on reasons: The case ofattraction and compromise effects. Journal of ConsumerResearch, 16:158–174.

Trueblood, J. S. (2014). The multiattribute linear ballisticaccumulator model of context effects in multialternative choice.Psychological Review, 121(2):179– 205.

Tversky, A. (1972). Elimination by aspects: A theory of choice.Psychological Review, 79:281–299.

Usher, M. and McClelland, J. L. (2004). Loss aversion andinhibition in dynamical models of multialternative choice.Psychological Review, 111:757– 769.

Wen, C.-H. and Koppelman, F. (2001). The generalized nestedlogit model. Transportation Research Part B, 35:627–641.


References

References IV

Williams, H. (1977). On the formulation of travel demand modelsand economic evaluation measures of user benefit. Environmentand Planning A, 9(3):285–344.

Yai, T. (1997). Multinomial probit with structured covariance forroute choice behavior. Transportation Research Part B:Methodological, 31(3):195–207.

Yue, Y. and Guestrin, C. (2011). Linear submodular bandits andtheir application to diversified retrieval. In Shawe-taylor, J.,Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors,Advances in Neural Information Processing Systems 24, pages2483–2491.


References

References V

Zhang, S. and Yu, A. J. (2013). Forgetful Bayes and myopicplanning: Human learning and decision-making in a banditsetting. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z.,and Weinberger, K., editors, Advances in Neural InformationProcessing Systems 26, pages 2607–2615. Curran Associates,Inc.

Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen, G.(2005). Improving recommendation lists through topicdiversification. In Proceedings of the 14th internationalconference on World Wide Web (WWW 2005), pages 22–32.ACM.


predicting preference reversals via gaussian process uncertainty aversion

Economy & Finance

choice set

nal choice

references agenda

conclusion aistats

maximizer of random

mean utility function

b c d e cpu

chosen set