learning to rank with click models: from online algorithms ... · learning to rank with click...
TRANSCRIPT
![Page 1: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/1.jpg)
Learning to Rank with Click Models: From OnlineAlgorithms to Offline Evaluations
Shuai LI
The Chinese University of Hong Kong
Shuai LI (CUHK) Learning to Rank 1 / 53
![Page 2: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/2.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 2 / 53
![Page 3: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/3.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 3 / 53
![Page 4: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/4.jpg)
Motivation – Learning to Rank
Amazon, YouTube, Facebook, Netflix, TaobaoShuai LI (CUHK) Learning to Rank 4 / 53
![Page 5: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/5.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 5 / 53
![Page 6: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/6.jpg)
Background – Multi-armed Bandit Problem
A special case of reinforcement learning
There are L arms
Each arm a has an unknown reward distribution with unknown mean αa
The best arm is a∗ = argmax αa
Shuai LI (CUHK) Learning to Rank 6 / 53
![Page 7: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/7.jpg)
Background – Multi-armed Bandit Setting
At each time t
The learning agent selects one arm atObserve the reward Xat ,t
The objective is to minimize the regret in T rounds
R(T ) = Tα∗ − E
[T∑t=1
αat
]
Balance the trade-off between exploitation and exploration
Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before
Shuai LI (CUHK) Learning to Rank 7 / 53
![Page 8: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/8.jpg)
Background – Multi-armed Bandit Setting
At each time t
The learning agent selects one arm atObserve the reward Xat ,t
The objective is to minimize the regret in T rounds
R(T ) = Tα∗ − E
[T∑t=1
αat
]
Balance the trade-off between exploitation and exploration
Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before
Shuai LI (CUHK) Learning to Rank 7 / 53
![Page 9: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/9.jpg)
Background – Multi-armed Bandit Setting
At each time t
The learning agent selects one arm atObserve the reward Xat ,t
The objective is to minimize the regret in T rounds
R(T ) = Tα∗ − E
[T∑t=1
αat
]
Balance the trade-off between exploitation and exploration
Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before
Shuai LI (CUHK) Learning to Rank 7 / 53
![Page 10: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/10.jpg)
Background – Upper Confidence Bound
UCB (Upper Confidence Bound) [ACF’02]
UCB policy: select
at = argmaxa αa,t +
√3 ln(t)
2Ta(t)
whereαa,t is the empirical mean of arm a in time t — ExploitationTa(t) is the played times of arm a — Exploration
Gap-dependent bound O( L∆ log(T )) where ∆ = minαa<α∗ α
∗ − αa,match lower boundGap-free bound O(
√LT log(T )) tight up to a factor of
√log(T )
Shuai LI (CUHK) Learning to Rank 8 / 53
![Page 11: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/11.jpg)
Background – Upper Confidence Bound
UCB (Upper Confidence Bound) [ACF’02]
UCB policy: select
at = argmaxa αa,t +
√3 ln(t)
2Ta(t)
whereαa,t is the empirical mean of arm a in time t — ExploitationTa(t) is the played times of arm a — Exploration
Gap-dependent bound O( L∆ log(T )) where ∆ = minαa<α∗ α
∗ − αa,match lower boundGap-free bound O(
√LT log(T )) tight up to a factor of
√log(T )
Shuai LI (CUHK) Learning to Rank 8 / 53
![Page 12: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/12.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 9 / 53
![Page 13: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/13.jpg)
Online Learning to Rank
There are L items
Each item a with an unknown attractiveness α(a)
There are K positions
At time t
The learning agent selects a list of items At = (at1, . . . , atK )
Receive the click feedback Ct ∈ 0, 1K
The objective is to minimize the regret over T rounds
R(T ) = T r(A∗)− E
[T∑t=1
r(At)
]
where
r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)
Shuai LI (CUHK) Learning to Rank 10 / 53
![Page 14: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/14.jpg)
Online Learning to Rank
There are L items
Each item a with an unknown attractiveness α(a)
There are K positions
At time t
The learning agent selects a list of items At = (at1, . . . , atK )
Receive the click feedback Ct ∈ 0, 1K
The objective is to minimize the regret over T rounds
R(T ) = T r(A∗)− E
[T∑t=1
r(At)
]
where
r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)
Shuai LI (CUHK) Learning to Rank 10 / 53
![Page 15: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/15.jpg)
Online Learning to Rank
There are L items
Each item a with an unknown attractiveness α(a)
There are K positions
At time t
The learning agent selects a list of items At = (at1, . . . , atK )
Receive the click feedback Ct ∈ 0, 1K
The objective is to minimize the regret over T rounds
R(T ) = T r(A∗)− E
[T∑t=1
r(At)
]
where
r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)
Shuai LI (CUHK) Learning to Rank 10 / 53
![Page 16: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/16.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 11 / 53
![Page 17: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/17.jpg)
Contents
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 12 / 53
![Page 18: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/18.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 19: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/19.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stops
At most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 20: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/20.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 click
r(A) = 1−∏K
k=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 21: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/21.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 22: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/22.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 23: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/23.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 24: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/24.jpg)
Click Models
Click models describe how users interact with a list ofitems
Cascade Model (CM)
Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−
∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))
The meaning of received feedback (0, 0, 1, 0, 0)
7
7
X
?
?
Click Model Regret
[KSWA, 2015] CM O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 13 / 53
![Page 25: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/25.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 14 / 53
![Page 26: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/26.jpg)
Contextual Bandit Setting
Contexts
User profiles, search keywordsImportant for search and recommendations
Assume each item a is represented by xt,a ∈ Rd
Assume the attractiveness for item a
αt(a) = θ>xt,a
by a fixed but unknown weight vector θ
When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.
Shuai LI (CUHK) Learning to Rank 15 / 53
![Page 27: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/27.jpg)
Contextual Bandit Setting
Contexts
User profiles, search keywordsImportant for search and recommendations
Assume each item a is represented by xt,a ∈ Rd
Assume the attractiveness for item a
αt(a) = θ>xt,a
by a fixed but unknown weight vector θ
When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.
Shuai LI (CUHK) Learning to Rank 15 / 53
![Page 28: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/28.jpg)
Contextual Bandit Setting
Contexts
User profiles, search keywordsImportant for search and recommendations
Assume each item a is represented by xt,a ∈ Rd
Assume the attractiveness for item a
αt(a) = θ>xt,a
by a fixed but unknown weight vector θ
When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.
Shuai LI (CUHK) Learning to Rank 15 / 53
![Page 29: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/29.jpg)
Contextual Bandit Setting
Contexts
User profiles, search keywordsImportant for search and recommendations
Assume each item a is represented by xt,a ∈ Rd
Assume the attractiveness for item a
αt(a) = θ>xt,a
by a fixed but unknown weight vector θ
When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.
Shuai LI (CUHK) Learning to Rank 15 / 53
![Page 30: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/30.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1b
Shuai LI (CUHK) Learning to Rank 16 / 53
![Page 31: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/31.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .
Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1b
Shuai LI (CUHK) Learning to Rank 16 / 53
![Page 32: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/32.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1b
Shuai LI (CUHK) Learning to Rank 16 / 53
![Page 33: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/33.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1b
Shuai LI (CUHK) Learning to Rank 16 / 53
![Page 34: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/34.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1b
Shuai LI (CUHK) Learning to Rank 16 / 53
![Page 35: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/35.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm
C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1
For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1
With high probability ∥∥∥θ − θ∥∥∥V≤ βt
thus with high probability
αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1
Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1
Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate
V ← V +
Kt∑k=1
xt,atkx>t,at
k, b ← b +
Kt∑k=1
xt,atkCt(k)
θ = V−1bShuai LI (CUHK) Learning to Rank 16 / 53
![Page 36: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/36.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Results
We prove a regret bound
R(T ) = O
(d
p∗
√TK ln(T )
)
Experimental results —Ours —CombCascade
0 500 1000 1500 2000 2500 3000
Time t
0
50
100
150
Reg
ret
Synthetic Data
C3-UCB
CombCascade
0 500 1000 1500 2000
Time t
0
200
400
600
800
1000
1200
Reg
ret
Network 1221
C3-UCB
CombCascade
Shuai LI (CUHK) Learning to Rank 17 / 53
![Page 37: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/37.jpg)
Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Results
We prove a regret bound
R(T ) = O
(d
p∗
√TK ln(T )
)
Experimental results —Ours —CombCascade
0 500 1000 1500 2000 2500 3000
Time t
0
50
100
150
Reg
ret
Synthetic Data
C3-UCB
CombCascade
0 500 1000 1500 2000
Time t
0
200
400
600
800
1000
1200
Reg
ret
Network 1221
C3-UCB
CombCascade
Shuai LI (CUHK) Learning to Rank 17 / 53
![Page 38: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/38.jpg)
Summary on Bandits with Click Models
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
Shuai LI (CUHK) Learning to Rank 18 / 53
![Page 39: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/39.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 19 / 53
![Page 40: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/40.jpg)
Online Clustering of Contextual Cascading Bandits [LZ,AAAI’2018]
Find clustering over users as well as recommending
The attractiveness function is generalized linear (GL)
Improve the regret results
Experiments —Ours · · ·C3-UCB
0M 1M 2M 3M 4M 5MTime t
0K
10K
20K
30K
40K
Cum. Regret
CLUB-cascade
C3-UCB/CascadeLinUCB
0M 1M 2M 3M 4M 5MTime t
0K
10K
20K
30K
40K
50K
Cum. Regret
CLUB-cascade
C3-UCB/CascadeLinUCB
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
Shuai LI (CUHK) Learning to Rank 20 / 53
![Page 41: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/41.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 21 / 53
![Page 42: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/42.jpg)
Improved Algorithm on Clustering Bandits [LCLL,IJCAI’2019]
Arbitrary frequency distribution over users (compared to uniformdistribution)
Prove a regret bound that is free of the minimal frequency over users
R(T ) = O
(d√mT ln(T ) +
(1
γ2p
+nuγ2λ3
x
)ln(T )
)(compared to R(T ) = O
(d√mT ln(T ) + 1
pminγ2λ3x
ln(T ))
)
where nu is number of users and m is number of clusters
Experiments —Ours —CLUB —LinUCB-One —LinUCB-Ind
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
Synthetic
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
MovieLens
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
Yelp
Shuai LI (CUHK) Learning to Rank 22 / 53
![Page 43: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/43.jpg)
Improved Algorithm on Clustering Bandits [LCLL,IJCAI’2019]
Arbitrary frequency distribution over users (compared to uniformdistribution)
Prove a regret bound that is free of the minimal frequency over users
R(T ) = O
(d√mT ln(T ) +
(1
γ2p
+nuγ2λ3
x
)ln(T )
)(compared to R(T ) = O
(d√mT ln(T ) + 1
pminγ2λ3x
ln(T ))
)
where nu is number of users and m is number of clusters
Experiments —Ours —CLUB —LinUCB-One —LinUCB-Ind
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
Synthetic
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
MovieLens
0 200k 400k 600k 800k 1mTime t
0
20k
40k
60k
Regr
et
Yelp
Shuai LI (CUHK) Learning to Rank 22 / 53
![Page 44: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/44.jpg)
Contents
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 23 / 53
![Page 45: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/45.jpg)
Dependent Click Model (DCM)
Allow multiple clicks
Assumes there is a probability ofsatisfaction after each click
r(A) = 1−∏K
k=1(1− α(ak)γk)
γk : satisfaction probability after clickon position k
The meaning of received feedback(0, 1, 0, 1, 0)
7no click
Xclick, not satisfied
7no click
Xclick, satisfied?
?
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
Shuai LI (CUHK) Learning to Rank 24 / 53
![Page 46: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/46.jpg)
Dependent Click Model (DCM)
Allow multiple clicks
Assumes there is a probability ofsatisfaction after each click
r(A) = 1−∏K
k=1(1− α(ak)γk)
γk : satisfaction probability after clickon position k
The meaning of received feedback(0, 1, 0, 1, 0)
7no click
Xclick, not satisfied
7no click
Xclick, satisfied?
?
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
Shuai LI (CUHK) Learning to Rank 24 / 53
![Page 47: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/47.jpg)
Dependent Click Model (DCM)
Allow multiple clicks
Assumes there is a probability ofsatisfaction after each click
r(A) = 1−∏K
k=1(1− α(ak)γk)
γk : satisfaction probability after clickon position k
The meaning of received feedback(0, 1, 0, 1, 0)
7no click
Xclick, not satisfied
7no click
Xclick, satisfied?
?
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
Shuai LI (CUHK) Learning to Rank 24 / 53
![Page 48: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/48.jpg)
Dependent Click Model (DCM)
Allow multiple clicks
Assumes there is a probability ofsatisfaction after each click
r(A) = 1−∏K
k=1(1− α(ak)γk)
γk : satisfaction probability after clickon position k
The meaning of received feedback(0, 1, 0, 1, 0)
7no click
Xclick, not satisfied
7no click
Xclick, satisfied?
?
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
Shuai LI (CUHK) Learning to Rank 24 / 53
![Page 49: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/49.jpg)
Contents
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 25 / 53
![Page 50: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/50.jpg)
Position-Based Model (PBM)
Most popular model in industry
Assumes the user click probability on an item a of position k can befactored into βk · α(a)
βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK
r(A) =∑K
k=1 βkα(ak)
The meaning of received feedback (0, 1, 0, 1, 0)
Shuai LI (CUHK) Learning to Rank 26 / 53
![Page 51: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/51.jpg)
Position-Based Model (PBM)
Most popular model in industry
Assumes the user click probability on an item a of position k can befactored into βk · α(a)
βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK
r(A) =∑K
k=1 βkα(ak)
The meaning of received feedback (0, 1, 0, 1, 0)
Shuai LI (CUHK) Learning to Rank 26 / 53
![Page 52: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/52.jpg)
Position-Based Model (PBM)
Most popular model in industry
Assumes the user click probability on an item a of position k can befactored into βk · α(a)
βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK
r(A) =∑K
k=1 βkα(ak)
The meaning of received feedback (0, 1, 0, 1, 0)
Shuai LI (CUHK) Learning to Rank 26 / 53
![Page 53: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/53.jpg)
Position-Based Model (PBM)
Most popular model in industry
Assumes the user click probability on an item a of position k can befactored into βk · α(a)
βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK
r(A) =∑K
k=1 βkα(ak)
The meaning of received feedback (0, 1, 0, 1, 0)
Shuai LI (CUHK) Learning to Rank 26 / 53
![Page 54: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/54.jpg)
Position-Based Model (PBM)
Most popular model in industry
Assumes the user click probability on an item a of position k can befactored into βk · α(a)
βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK
r(A) =∑K
k=1 βkα(ak)
The meaning of received feedback (0, 1, 0, 1, 0)
Shuai LI (CUHK) Learning to Rank 26 / 53
![Page 55: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/55.jpg)
Summary on Bandits with Click Models
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
[LVC, 2016] - PBM with β O( L∆ log(T ))
Shuai LI (CUHK) Learning to Rank 27 / 53
![Page 56: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/56.jpg)
Contents
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 28 / 53
![Page 57: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/57.jpg)
General Click Models
Common observations for click models
The click-through-rate (CTR) of list A on position k can be factoredinto
CTR(A, k) = χ(A, k) α(ak)
χ(A, k) is the examination probability of list A on position k
E.g. χ(A, k) =∏k−1
i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model
Difficulties on General Click Models
χ depends on both click models and lists
Shuai LI (CUHK) Learning to Rank 29 / 53
![Page 58: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/58.jpg)
General Click Models
Common observations for click models
The click-through-rate (CTR) of list A on position k can be factoredinto
CTR(A, k) = χ(A, k) α(ak)
χ(A, k) is the examination probability of list A on position k
E.g. χ(A, k) =∏k−1
i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model
Difficulties on General Click Models
χ depends on both click models and lists
Shuai LI (CUHK) Learning to Rank 29 / 53
![Page 59: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/59.jpg)
General Click Models
Common observations for click models
The click-through-rate (CTR) of list A on position k can be factoredinto
CTR(A, k) = χ(A, k) α(ak)
χ(A, k) is the examination probability of list A on position k
E.g. χ(A, k) =∏k−1
i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model
Difficulties on General Click Models
χ depends on both click models and lists
Shuai LI (CUHK) Learning to Rank 29 / 53
![Page 60: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/60.jpg)
Summary on Bandits with Click Models
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
[LVC, 2016] - PBM with β O( L∆ log(T ))
[ZTGKSW, 2017] - General O(K3L
∆ log(T ))
[LKLS, NIPS’2018] - General O(KL∆ log(T )
)O(√
K 3LT log(T ))
Ω(√
KLT)
Shuai LI (CUHK) Learning to Rank 30 / 53
![Page 61: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/61.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 62: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/62.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 63: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/63.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimator
X = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 64: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/64.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 65: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/65.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 66: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/66.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 67: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/67.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Preparation
Recall
Each item a is represented by a feature vector xa ∈ Rd
The attractiveness of item a is α(a) = θ>xa
We bring up an algorithm called RecurRank (Recursive Ranking)
G-optimal design
Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd
For any distribution π : X → [0, 1], let Q(π) =∑
x∈X π(x)xx>
By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that
max det(Q(π)) or equivalently maxx∈X‖x‖2
Q(π)† ≤ d
John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2
Shuai LI (CUHK) Learning to Rank 31 / 53
![Page 68: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/68.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank Algorithm
Each instantiation is called with three arguments:1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for
∑a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 69: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/69.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank AlgorithmEach instantiation is called with three arguments:
1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for
∑a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 70: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/70.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank AlgorithmEach instantiation is called with three arguments:
1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for
∑a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 71: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/71.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank AlgorithmEach instantiation is called with three arguments:
1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for
∑a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 72: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/72.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank AlgorithmEach instantiation is called with three arguments:
1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiation
This instantiation runs for∑
a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 73: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/73.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm
RecurRank AlgorithmEach instantiation is called with three arguments:
1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.
The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )
Find a G -optimal design π = Gopt(A). Then compute
T (a) =
⌈d π(a)
2∆2`
log
(|A|δ`
)⌉, ∆` = 2−`
Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for
∑a∈A T (a) times
Shuai LI (CUHK) Learning to Rank 32 / 53
![Page 74: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/74.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all lists
E.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k, . . . , k + m − 1) for 100 roundsCompute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness
α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)
Shuai LI (CUHK) Learning to Rank 33 / 53
![Page 75: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/75.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all listsE.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k , . . . , k + m − 1) for 100 rounds
Compute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness
α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)
Shuai LI (CUHK) Learning to Rank 33 / 53
![Page 76: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/76.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all listsE.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k , . . . , k + m − 1) for 100 roundsCompute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness
α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)
Shuai LI (CUHK) Learning to Rank 33 / 53
![Page 77: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/77.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Eliminate bad arms an′+1, . . . , an if
α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an)
Split the partition for each consecutive gap larger than 2∆`
α(a1) ≥ · · · ≥ α(ak1 )
∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(ak2 )
∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an′)
k, · · · , k + k1 − 1
∣∣∣∣∣ k + k1, · · · , k + k2 − 1
∣∣∣∣∣ k + k2, · · · , k + m − 1
Call the refined partitions with phase `+ 1
Shuai LI (CUHK) Learning to Rank 34 / 53
![Page 78: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/78.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Eliminate bad arms an′+1, . . . , an if
α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an)
Split the partition for each consecutive gap larger than 2∆`
α(a1) ≥ · · · ≥ α(ak1 )
∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(ak2 )
∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an′)
k , · · · , k + k1 − 1
∣∣∣∣∣ k + k1, · · · , k + k2 − 1
∣∣∣∣∣ k + k2, · · · , k + m − 1
Call the refined partitions with phase `+ 1
Shuai LI (CUHK) Learning to Rank 34 / 53
![Page 79: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/79.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)
RecurRank Algorithm (Continued)
Eliminate bad arms an′+1, . . . , an if
α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an)
Split the partition for each consecutive gap larger than 2∆`
α(a1) ≥ · · · ≥ α(ak1 )
∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(ak2 )
∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`
≥ · · · ≥ α(an′)
k , · · · , k + k1 − 1
∣∣∣∣∣ k + k1, · · · , k + k2 − 1
∣∣∣∣∣ k + k2, · · · , k + m − 1
Call the refined partitions with phase `+ 1
Shuai LI (CUHK) Learning to Rank 34 / 53
![Page 80: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/80.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Results
Regret bound
R(T ) = O(K√
dT log(LT ))
Experiments —RecurRank(Ours) —C3-UCB —TopRank
0k 50k 100k 150k 200kTime t
10 2
10 1
100
101
102
103
Regr
et
(a) CM
0k 50k 100k 150k 200kTime t
0k
50k
100k
150k
200k
250k
300k
Regr
et
(b) PBM
Shuai LI (CUHK) Learning to Rank 35 / 53
![Page 81: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/81.jpg)
Online Learning to Rank with Features [LLS, ICML’2019] –Results
Regret bound
R(T ) = O(K√
dT log(LT ))
Experiments —RecurRank(Ours) —C3-UCB —TopRank
0k 50k 100k 150k 200kTime t
10 2
10 1
100
101
102
103
Regr
et
(a) CM
0k 50k 100k 150k 200kTime t
0k
50k
100k
150k
200k
250k
300k
Regr
et
(b) PBM
Shuai LI (CUHK) Learning to Rank 35 / 53
![Page 82: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/82.jpg)
Summary on Bandits with Click Models
Context Click Model Regret
[KSWA, 2015] - CM O( L∆ log(T ))
[LWZC, ICML’2016] Linear CM O( dp∗
√TK log(T ))
[LZ, AAAI’2018] GL CM O(d√TK log(T ))
[KKSW, 2016] - DCM O( L∆ log(T ))
[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))
[LVC, 2016] - PBM with β O( L∆ log(T ))
[ZTGKSW, 2017] - General O(K3L
∆ log(T ))
[LKLS, NIPS’2018] - General O(KL∆ log(T )
)O(√
K 3LT log(T ))
Ω(√
KLT)
[LLS, ICML’2019] Linear General O(K√dT log(LT ))
Shuai LI (CUHK) Learning to Rank 36 / 53
![Page 83: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/83.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 37 / 53
![Page 84: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/84.jpg)
Offline Evaluations
Motivation
Can we estimate the expected number of clicks of new policies withoutdirectly employing it?
Offline Evaluation!
Objective:
To design statistically efficient estimators based on logged dataset forany ranking policy
Challenge:
The number of different lists is exponential in K
Shuai LI (CUHK) Learning to Rank 38 / 53
![Page 85: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/85.jpg)
Offline Evaluations
Motivation
Can we estimate the expected number of clicks of new policies withoutdirectly employing it?
Offline Evaluation!
Objective:
To design statistically efficient estimators based on logged dataset forany ranking policy
Challenge:
The number of different lists is exponential in K
Shuai LI (CUHK) Learning to Rank 38 / 53
![Page 86: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/86.jpg)
Offline Evaluations
Motivation
Can we estimate the expected number of clicks of new policies withoutdirectly employing it?
Offline Evaluation!
Objective:
To design statistically efficient estimators based on logged dataset forany ranking policy
Challenge:
The number of different lists is exponential in K
Shuai LI (CUHK) Learning to Rank 38 / 53
![Page 87: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/87.jpg)
Offline Evaluations
Motivation
Can we estimate the expected number of clicks of new policies withoutdirectly employing it?
Offline Evaluation!
Objective:
To design statistically efficient estimators based on logged dataset forany ranking policy
Challenge:
The number of different lists is exponential in K
Shuai LI (CUHK) Learning to Rank 38 / 53
![Page 88: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/88.jpg)
Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018]– Results
We design estimators for different click models
Item-Position, Random, Rank-Based, Position-Based, Document-Based
We prove that our estimators
are unbiased in a larger class of policieshave lower biasthe best policy have better theoretical guarantees
than the existing unstructured estimators under the correspondingclick model assumptions
Shuai LI (CUHK) Learning to Rank 39 / 53
![Page 89: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/89.jpg)
Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018]– Results
We design estimators for different click models
Item-Position, Random, Rank-Based, Position-Based, Document-Based
We prove that our estimators
are unbiased in a larger class of policieshave lower biasthe best policy have better theoretical guarantees
than the existing unstructured estimators under the correspondingclick model assumptions
Shuai LI (CUHK) Learning to Rank 39 / 53
![Page 90: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/90.jpg)
Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018] – Experiments
Experiments – 100 most frequent queries in Yandex dataset
100 101 102 103 104 105
M
0.04
0.06
0.08
0.10
0.12
RM
SE
(a) 100 Queries: K = 2
RCTR
Item
IP
PBM
List
100 101 102 103 104 105
M
0.04
0.06
0.08
0.10
0.12
RM
SE
(b) 100 Queries: K = 3
RCTR
Item
IP
PBM
List
100 101 102 103 104 105
M
0.04
0.06
0.08
0.10
0.12
0.14
0.16
RM
SE
100 Queries: K = 10
RCTR
Item
IP
PBM
List
Shuai LI (CUHK) Learning to Rank 40 / 53
![Page 91: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/91.jpg)
Outline
1 Motivation
2 Background
3 Problem Definition – Online
4 Click ModelsCascade Model (CM)
ICML’2016AAAI’2018IJCAI’2019
Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019
5 Offline Evaluations – KDD’2018
6 Conclusions
Shuai LI (CUHK) Learning to Rank 41 / 53
![Page 92: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/92.jpg)
Conclusions
Context + Cascade model (CM) / Dependent click model (DCM)
Online clustering of bandits + Cascade model (CM)
Improved algorithm on clustering of bandits
Context + General click model
Offline evaluation of ranking policies with click models
Shuai LI (CUHK) Learning to Rank 42 / 53
![Page 93: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/93.jpg)
Publications
First-author papers in thesis – in the order of thesis
1 Shuai Li, Baoxiang Wang, Shengyu Zhang, Wei Chen, ContextualCombinatorial Cascading Bandits, ICML, 2016
2 Shuai Li, Shengyu Zhang, Online Clustering of Contextual CascadingBandits, AAAI, 2018
3 Shuai Li, Wei Chen, S Li, Kwong-Sak Leung, Improved Algorithm onClustering of Bandits, IJCAI 2019
4 Shuai Li, Tor Lattimore, Csaba Szepesvari, Online Learning to Rankwith Features, ICML, 2019
5 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan,Vishwa Vinay and Zheng Wen, Offline Evaluation of Ranking Policieswith Click Models, KDD, 2018
Shuai LI (CUHK) Learning to Rank 43 / 53
![Page 94: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/94.jpg)
Publications
Mentioned co-authored papers
6 Weiwen Liu, Shuai Li, Shengyu Zhang, Contextual Dependent ClickBandit Algorithm for Web Recommendation, COCOON, 2018
7 Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari,TopRank: A Practical Algorithm for Online Stochastic Ranking,NeurIPS, 2018
Other co-authored papers
8 Pengfei Liu, Hongjian Li, Shuai Li, Kwong-Sak Leung, ImprovingPrediction of Phenotypic Drug Response on Cancer Cell Lines UsingDeep Convolutional Network, BMC Bioinformatics, 2019
9 Ran Wang, Shuai Li, Man-Hon Wong, and Kwong-Sak Leung,Drug-Protein-Disease Association Prediction and Drug RepositioningBased on Tensor Decomposition, BIBM, 2018
10 Pengfei Liu, Shuai Li, Weiying Yi, Kwong-Sak Leung, A HybridDistributed Framework for SNP Selections, PDPTA, 2016
Shuai LI (CUHK) Learning to Rank 44 / 53
![Page 95: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/95.jpg)
Publications
In submission
11 Shuai Li, Wei Chen, Zheng Wen, Kwong-Sak Leung, StochasticOnline Learning with Probabilistic Feedback Graph
12 Shuai Li, Kwong-Sak Leung, Generalized Clustering Bandits
13 Shuai Li, Tong Yu, Ole Mengshoel, Kwong-Sak Leung, OnlineSemi-Supervised Learning with Large Margin Separation
14 Xiaojin Zhang, Shuai Li, Shengyu Zhang, Contextual CombinatorialConservative Bandits
15 Pengfei Liu, Shuai Li, Kwong-Sak Leung, The Recovery of StochasticDifferential Equations with Genetic Programming andKullback-Leibler Divergence
Shuai LI (CUHK) Learning to Rank 45 / 53
![Page 96: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/96.jpg)
Thank you!
&
Questions?
Shuai LI (CUHK) Learning to Rank 46 / 53
![Page 97: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/97.jpg)
References I
P. Auer, N. Cesa-Bianchi, and P. Fischer.Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2-3):235–256, 2002.
S. Katariya, B. Kveton, C. Szepesvari, and Z. Wen.Dcm bandits: Learning to rank with multiple clicks.In International Conference on Machine Learning, pages 1215–1224,2016.
B. Kveton, C. Szepesvari, Z. Wen, and A. Ashkan.Cascading bandits: Learning to rank in the cascade model.In International Conference on Machine Learning, pages 767–776,2015.
Shuai LI (CUHK) Learning to Rank 47 / 53
![Page 98: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/98.jpg)
References II
P. Lagree, C. Vernade, and O. Cappe.Multiple-play bandits in the position-based model.In Advances in Neural Information Processing Systems, pages1597–1605, 2016.
T. Lattimore, B. Kveton, Li, Shuai, and C. Szepesvari.Toprank: A practical algorithm for online stochastic ranking.In The Conference on Neural Information Processing Systems, 2018.
W. Liu, Li, Shuai, and S. Zhang.Contextual dependent click bandit algorithm for web recommendation.
In International Computing and Combinatorics Conference, pages39–50. Springer, 2018.
Shuai LI (CUHK) Learning to Rank 48 / 53
![Page 99: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/99.jpg)
References III
Li, Shuai, Y. Abbasi-Yadkori, B. Kveton, S. Muthukrishnan, V. Vinay,and Z. Wen.Offline evaluation of ranking policies with click models.In ACM SIGKDD Conference on Knowledge Discovery and DataMining, 2018.
Li, Shuai, W. Chen, S. Li, and K.-S. Leung.Improved algorithm on online clustering of bandits.In International Joint Conference on Artificial Intelligence (IJCAI),2019.
Li, Shuai, T. Lattimore, and C. Szepesvari.Online learning to rank with features.In International Conference on Machine Learning (ICML), 2019.
Shuai LI (CUHK) Learning to Rank 49 / 53
![Page 100: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/100.jpg)
References IV
Li, Shuai, B. Wang, S. Zhang, and W. Chen.Contextual combinatorial cascading bandits.In International Conference on Machine Learning, pages 1245–1253,2016.
Li, Shuai and S. Zhang.Online clustering of contextual cascading bandits.In The AAAI Conference on Artificial Intelligence, 2018.
M. Zoghi, T. Tunys, M. Ghavamzadeh, B. Kveton, C. Szepesvari, andZ. Wen.Online learning to rank in stochastic click models.In International Conference on Machine Learning, pages 4199–4208,2017.
Shuai LI (CUHK) Learning to Rank 50 / 53
![Page 101: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/101.jpg)
References V
S. Zong, H. Ni, K. Sung, N. R. Ke, Z. Wen, and B. Kveton.Cascading bandits for large-scale recommendation problems.In Proceedings of the Thirty-Second Conference on Uncertainty inArtificial Intelligence, pages 835–844. AUAI Press, 2016.
Shuai LI (CUHK) Learning to Rank 51 / 53
![Page 102: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/102.jpg)
A Key Part Proof for CLUB-cascade (Improving C3-UCB)
Et [R(At , y t)]
=Et
[(1−
K∏k=1
(1− y t(x∗t,k))
)−
(1−
K∏k=1
(1− y t(x t,k))
)]
=Et
[K∏
k=1
(1− y t(x t,k))−K∏
k=1
(1− y t(x∗t,k))
]
=Et
[K∑
k=1
(k−1∏`=1
(1− y t(x t,`))
)[(1− y t(x t,k))− (1− y t(x
∗t,k))
]( K∏`=k+1
(1− y t(x∗t,`))
)]
≤Et
[K∑
k=1
(k−1∏`=1
(1− y t(x t,`))
)[y t(x
∗t,k)− y t(x t,k)]
]
=Et
[K t∑k=1
[y t(x∗t,k)− y t(x t,k)]
]
Shuai LI (CUHK) Learning to Rank 52 / 53
![Page 103: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/103.jpg)
Proof Sketch for RecurRank
Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`i
Prove with high probability for any (`, i)
a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i
In (`, i)th call, item a is put at position k, then
χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part
Shuai LI (CUHK) Learning to Rank 53 / 53
![Page 104: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/104.jpg)
Proof Sketch for RecurRank
Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`iProve with high probability for any (`, i)
a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i
In (`, i)th call, item a is put at position k, then
χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part
Shuai LI (CUHK) Learning to Rank 53 / 53
![Page 105: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of](https://reader030.vdocuments.mx/reader030/viewer/2022040402/5e7ecc124d595109277e076d/html5/thumbnails/105.jpg)
Proof Sketch for RecurRank
Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`iProve with high probability for any (`, i)
a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i
In (`, i)th call, item a is put at position k, then
χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part
Shuai LI (CUHK) Learning to Rank 53 / 53