dynamic assortment planning under popular discrete choice

43
Dynamic Assortment Planning Under Popular Discrete Choice Models Xi Chen Stern School of Business New York University

Upload: others

Post on 27-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

PowerPoint TemplateXi Chen
Collaborators
2
Yuan Zhou
Assortment Optimization
Select a subset of substitutable items to maximize expected revenue Recommendation in online retailing
3
Presenter
Key in assortment optimization: understand a customer’s purchase behavior
Discrete choice model: = Pr[customer selects item when set is offered]
Popular choice models:
• Nested Logit (NL) (Williams (1977))
2000 Nobel Prize in Economics to Prof. McFadden for: “development of theory and methods for analyzing discrete choice” 6
MNL: Multinomial Logit Models
Multinomial logit (MNL) model: There are in total available items in the pool
Each item has a mean utility/preference score: 1, … ,
Choice probability: let = exp()
=
Nested Logit Models
Nested logit model nests with items in each nest Assortment = 1, … , ∈
Choice probability for item in nest Pr nest selected| = ⋅ Pr item selected ]
coats suitsno purchase
For a product, rich feature information is usually available Static features: brand, texture, color Evolving features: price, historical selling information, news related to
the product
Customer’s utility/preference of a product also changes over time (seasonal product)
Incorporate contextual/feature information and obtain a dynamic contextual choice model
9
Static Assortment Optimization
Static assortment optimization: given the choice model , how to choose the best set of items?
∗ = argmax ⊆[]


Combinatorial optimization problem • MNL model: Talluri et al. (2004), Gallego et al. (2004) • Nested logit and extensions: Li and Rusmevichientong (2014), Gallego and
Topaloglu (2014), Davis et al. (2014), Li et al. (2015)
However, in practice, choice probabilities are unknown and need to be learned
Expected revenue when offering
revenue of item , ∈ [0,1]
10
choice probabilities • Fashion products • Parameters to be actively learned from experiments
Millions of candidate items (e.g., online advertising) • Cannot afford to estimate parameters for every item
Need a dynamic assortment optimization framework
11
Dynamic Assortment Planning
Dynamic assortment optimization (Caro and Gallien (07) and Rusmevichientong et al. (10), Saure et al. (13), Agrawal et al. (17, 18)Cheung & Simchi-Levi (17)) Selling horizon without inventory constraint At each time = 1,2, 3, … , offer an assortment ⊆ [] to an arriving
customer Observe the item ∈ ∪ {0} purchased the by customer
Dynamic policies that simultaneously learn the underlying choice probability () and make the decision on the assortment
1 2 3
1 2 3
max 1
[ ] , where =
∈ Pr =
Equivalent to minimizing the regret: the gap from the optimal expected revenue (∗) (with known utility parameters)
Reg = 1
( ∗ −[ ])
Computationally efficient policy such that Sublinear regret: Reg → 0 as → +∞ Optimal diminishing rate of Reg in terms of and other parameters
Expected revenue when offering at time
13
Related Works and Main Results
Dynamic assortment planning: a popular area in the past ten years Caro and Gallien (07) and Rusmevichientong et al. (10)
Un-capacitated MNL: a nearly optimal regret upto a log-factor in
Fundamentally different from the capacitated case: no dependence on
Choice Models Upper Bound Lower Bound Note Un-capacitated MNL 1/ Ω 1/ Chen, Wang, Zhou (NIPS’ 18)
arXiv:1805.04785
Capacitated MNL with constraint size ( ≤ /4)
/ Ω / Upper: Saure et al. (13), Agrawal et al. (17, 18) Lower: Chen and Wang (17)
14
Related Works and Main Results
Nested logit: : # of nests, n: # of items within each nest
An improved policy in the dependence on (when ≥ 1 3) but
suboptimal in (−1/3)
Choice Models Upper Bound Lower Bound Note Un-capacitated MNL 1/ Ω 1/ Chen, Wang, Zhou (NIPS’ 18)
arXiv:1805.04785
Capacitated MNL with constraint size
/ Ω / Upper: Saure et al. (13), Agrawal et al. (17, 18) Lower: Chen and Wang (17)
Nested Logit ( /) ( ≤ 1/3)
Ω / Chen, Wang, Zhou (18) arXiv:1806.10410
Contextual MNL with changing utility & feature information
/ (: dimension)
Ω(/ ) Cheung & Simchi-Levi (17)
15
Presenter
Personalized assortment planning with inventory constraints
Customer’s choice behavior is known but arriving sequence can be adversarially chosen
Worst-case analysis: competitive ratio
Golrezaei, Nazerzadeh, Rusmevichientong (14), Gallego et. al. (16), Chen, Ma, Simchi-Levi, Xin (17)Ma and Simchi-Levi17), Goyal et. al., (18), among others
16
Key Message
One should avoid direct estimation of utility parameters of each item Will incur a large regret
Learning and decision cannot be separated. Need to be integrated
Explore the intrinsic structure of the underlying problem Using off-the-shelve learning tools (e.g., bandit) is usually not
optimal (data are usually not i.i.d.)
17
Presenter
MNL Models
Assumption free and independent of the number of items
How to obtain the regret independent of ?
18
Policy Design: Revenue-ordered Structure
Revenue-ordered structure of the optimal assortment Items ordered by their revenues: 1 ≥ 2 ≥ ≥ Optimal assortment: ∗ ∈ 1 , 1,2 , … , 1, … , ,∅ : Talluri et al. (2004),
Gallego et al. (2004) Reduce the search space from 2 to
Level-set assortment: For any ∈ 0,1 , define the level set = { ∈ : ≥ } that
consists of all items with revenue ≥
There exits a threshold ∗ such that ∗ = (∗)
Instead of estimating each utility , search for the optimal revenue threshold ∗
19
Policy Design: Potential Function
Potential function = ( ) Expected revenue when offering the level set =
Searching for ∗ such that ∗ = ∗
The function involves unknown utility parameters
Piecewise linear
Unique intersection at ∗: ∗ = ∗
Left of ∗: () > Right of ∗: <
20
Policy Design: Estimate Function Value
Estimate the value = ( ) by repetitively offering the level-set assortment
Offers for times and collects average revenue as ()
By standard concentration inequality: − 1/
Build O( 1/) confidence interval on
Special stochastic zero-order optimization but with the knowledge of function shape
21
Confidence band of length ( 1/)
Upper confidence band (UCB) of ()
22
If UCB of < (implies ≥ ∗)
UCB of ()
Action: move the left endpoint +1 ←
UCB of ()
and 1 = 2 3
For = 1, … , Adaptively query () for = ln()/ − 2 times (more
exploration as time goes on)
If UCB of < , move the right endpoint +1 ← and update +1 and +1
Else if UCB of > , move the left endpoint +1 ← and update +1 and +1
(shrink the interval into 2/3 of its previous length)
25
Assumption free and independent of the number of items
Can be improved to loglog 26
Presenter
Nested Logit Models
nests (indexed by ) and items (indexed by ) within each nest revenue parameters ∈ 0,1 : known Bounded utility parameters: ∈ 0, : unknown Nest correlation parameters: ∈ 0,1 : unknown
27
Nest level choice:
1 + ∑=1
Item level choice: Pr = | = , =
1 + ∑∈
Expected revenue:
Pr = | = ,
= ∑=1
1 + ∑=1
= ∈
() = ∑∈ ∑∈
Nest-level aggregate revenue 28
Presentation Notes
Similar structure to the revenue of MNL but with aggregated utilities and revenues
Key idea behind our policy
Avoid estimating utility for each single item and correlation parameter : () parametrers
Expected revenue = ∑=1
1+∑=1
, directly estimate
Nested-level aggregate utility with = ∑∈
In total, () parameters
Policy Design: Revenue-ordered Assortment
Revenue-ordered assortment The level set for nest : = { ∈ nest i: ≥ } that consists of all
items in nest with revenue ≥
Optimality of level set assortment (Davis et al., 14, Li et al., 15):
There exists thresholds 1∗, …∗ such that the level set assortment ∗ = 1 1∗ ,2 2∗ , … , ∗
is optimal in expected revenue
Suffice to consider level sets only
30
Policy Design: Epoch based Exploration
Epoch based exploration motivated by Agrawal et al. (17) for capacitated MNL
For a chosen assortment = (1, … , ): offer it repetitively until no purchase
31
Policy Design: UCB Approach
For every level set = in nest , construct ,: upper confidence bound (UCB) of nest-level utility
,: upper confidence bound of nest-level revenue
UCB based greedy assortment planning:
= 1 1 , … , : argmax1,…, ∑=1 , , 1 + ∑=1 ,
Efficient binary search algorithm (each is optimized separately)
Upper bound of expected rev. = ∑=1
1+∑=1
32
Further improvement on possible via a discretization technique (see arXiv:1806.10410)
33
Contextual information under changing environments At each time , observe the -dim feature of item ∈ [] ∈
• E.g., size, brand, price, seasonal information
Utility of the item will be =< ,∗ >
∗: underlying parameter to be learned
Models choice behavior of new products
Dynamic contextual MNL
34
Dynamic Contextual MNL
Key challenge: utility parameters of the same assortment are changing over time
Additional assortment cardinality constraint ( ≤ )
35
Main Results
Independent of the number of items ( only hides log())
There is still a gap between upper and lower bound, but cardinality size is usually small 36
Policy Overview
Stage 1 (exploration for 0 = ( ) periods) Random select assortment of size one and compute MLE Obtain a good “pilot” estimator 0
Stage 2 (UCB)for = 0 + 1 to Compute a local MLE
∈ argmax −0 ≤ ∑′< log Pr(′|′ )
Compute confidence interval (CI) of and select the assortment
∈ argmax ≤ + CI()
Likelihood of contextual MNL
estimate of revenue of given length of confidence interval 37
Experimental Results: MNL
Compare with ( = 500, 20 independent runs) Upper confidence bound (UCB) algorithm (S. Agrawal et. al., 18) Thompson sampling algorithm (S. Agrawal et. al., 17) Golden ratio search algorithm (GRS, Rusmevichientong et. al., 10)
(1) Better than GRS and UCB
(2) When is small, Thompson sampling is better and when is large, our method is better
38
(1) Much better than GRS and UCB
(2) When is small, Thompson sampling is better and when is large, our method is better
39
40
Conclusion
MNL (https://arxiv.org/abs/1805.04785)
Leveraging the structure to avoid directly estimating individual item utilities
Why binary search does not work
If the whole interval is above the line, middle point ≤ ∗
If the whole interval is below the line, middle point ≥ ∗
But if the whole interval intersects the line?
46
Collaborators
Key Message
MNL Models
Policy Design: Tri-section Search
Policy Design: Tri-section Search
Policy Design: Tri-section Search
Policy Design: Tri-section Search
Policy Design: Revenue-ordered Assortment
Policy Design: UCB Approach
Dynamic Contextual Choice Models
Dynamic Contextual Choice Models