online learning to rank with features - 2020 conference12-11-00)-12-12-0… · online learning to...

17
Online Learning to Rank with Features Authors: Shuai Li, Tor Lattimore, Csaba Szepesvári The Chinese University of Hong Kong DeepMind University of Alberta

Upload: others

Post on 13-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Online Learning to Rank with Features

Authors: Shuai Li, Tor Lattimore, Csaba Szepesvári

The Chinese University of Hong KongDeepMindUniversity of Alberta

Page 2: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Learning to Rank

Amazon, YouTube, Facebook, Netflix, Taobao

1

Page 3: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Online Learning to Rank

• There are L items and K ≤ L positions• At each time t = 1, 2, . . .,

• Choose an ordered list At = (at1, . . . , atK)• Show the user the list• Receive click feedback Ct1, . . . , CtK ∈ {0, 1}, per position

• Objective: Maximize the expected number of clicks

E

[ T∑t=1

K∑k=1

Ctk

]

2

Page 4: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Click Models

• Click models describe how users interact with itemlists

• Cascade Model (CM)• Assumes the user checks the list from position 1 toposition K, clicks at the first satisfying item andstops

• Dependent Click Model (DCM)• Further assumes there is a satisfaction probabilityafter click

• Position-Based Model (PBM)• Assumes the user click probability on an item a ofposition k can be factored into item attractivenessand position bias

• Generic model• Make as few assumptions as possible about theclick model

7

7

7

3

Page 5: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

RecurRank

• Each item a is represented by a feature vector xa ∈ Rd

• The attractiveness of item a is α(a) = θ⊤xa• Click probability factors: Pt (Cti = 1) = α(ati)χ(At, i) where χ isthe examination probability, which satisfies reasonableassumptions

• RecurRank (Recursive Ranking)• For each phase ℓ

• Use first position for exploration• Use remaining positions for exploitation, rank best items first

• Split items and positions when the phase ends• Recursively call the algorithm with increased phase

4

Page 6: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Example

1

8

ℓ = 1A||︷︸︸︷a1

···

a8···a50︸︷︷︸

t

Instance 1

1

1

3

ℓ = 2A||︷︸︸︷a1a2a3︸︷︷︸

ℓ = 2A||

4

8

︷︸︸︷a4···a8···a25︸︷︷︸

Instance 2

Instance 3

t1

1

3

ℓ = 3A||︷︸︸︷a1a2a3︸︷︷︸

Instance 4

t2

ℓ = 3A||

45

︷︸︸︷a4a5︸︷︷︸

ℓ = 3A||

6

8

︷︸︸︷a6a7a8···a12︸︷︷︸

Instance 5

Instance 6t3

· · ·

· · ·

· · ·

5

Page 7: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Example

1

8

ℓ = 1A||︷︸︸︷a1

···

a8···a50︸︷︷︸

t

Instance 1

1

1

3

ℓ = 2A||︷︸︸︷a1a2a3︸︷︷︸

ℓ = 2A||

4

8

︷︸︸︷a4···a8···a25︸︷︷︸

Instance 2

Instance 3

t1

1

3

ℓ = 3A||︷︸︸︷a1a2a3︸︷︷︸

Instance 4

t2

ℓ = 3A||

45

︷︸︸︷a4a5︸︷︷︸

ℓ = 3A||

6

8

︷︸︸︷a6a7a8···a12︸︷︷︸

Instance 5

Instance 6t3

· · ·

· · ·

· · ·

5

Page 8: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Example

1

8

ℓ = 1A||︷︸︸︷a1

···

a8···a50︸︷︷︸

t

Instance 1

1

1

3

ℓ = 2A||︷︸︸︷a1a2a3︸︷︷︸

ℓ = 2A||

4

8

︷︸︸︷a4···a8···a25︸︷︷︸

Instance 2

Instance 3

t1

1

3

ℓ = 3A||︷︸︸︷a1a2a3︸︷︷︸

Instance 4

t2

ℓ = 3A||

45

︷︸︸︷a4a5︸︷︷︸

ℓ = 3A||

6

8

︷︸︸︷a6a7a8···a12︸︷︷︸

Instance 5

Instance 6t3

· · ·

· · ·

· · ·

5

Page 9: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Example

1

8

ℓ = 1A||︷︸︸︷a1

···

a8···a50︸︷︷︸

t

Instance 1

1

1

3

ℓ = 2A||︷︸︸︷a1a2a3︸︷︷︸

ℓ = 2A||

4

8

︷︸︸︷a4···a8···a25︸︷︷︸

Instance 2

Instance 3

t1

1

3

ℓ = 3A||︷︸︸︷a1a2a3︸︷︷︸

Instance 4

t2

ℓ = 3A||

45

︷︸︸︷a4a5︸︷︷︸

ℓ = 3A||

6

8

︷︸︸︷a6a7a8···a12︸︷︷︸

Instance 5

Instance 6t3

· · ·

· · ·

· · ·

5

Page 10: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Example

1

8

ℓ = 1A||︷︸︸︷a1

···

a8···a50︸︷︷︸

t

Instance 1

1

1

3

ℓ = 2A||︷︸︸︷a1a2a3︸︷︷︸

ℓ = 2A||

4

8

︷︸︸︷a4···a8···a25︸︷︷︸

Instance 2

Instance 3

t1

1

3

ℓ = 3A||︷︸︸︷a1a2a3︸︷︷︸

Instance 4

t2

ℓ = 3A||

45

︷︸︸︷a4a5︸︷︷︸

ℓ = 3A||

6

8

︷︸︸︷a6a7a8···a12︸︷︷︸

Instance 5

Instance 6t3

· · ·

· · ·

· · ·

5

Page 11: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Results

• Regret bound

R(T) = O(K√dT log(LT))

• Improves over existing bound O(√

K3LT log(T))

0 50k 100k 150k 200kTime t

10−4

10−3

10−2

10−1

100

101

102

Regr

et

(a) CM

0 500k 1m 1.5m 2mTime t

0

100k

200k

300k

400k

500k

600k

700k

Regr

et

(b) PBM

—RecurRank —CascadeLinUCB —TopRank

6

Page 12: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Results

• Regret bound

R(T) = O(K√dT log(LT))

• Improves over existing bound O(√

K3LT log(T))

0 50k 100k 150k 200kTime t

10−4

10−3

10−2

10−1

100

101

102

Regr

et

(a) CM

0 500k 1m 1.5m 2mTime t

0

100k

200k

300k

400k

500k

600k

700k

Regr

et

(b) PBM

—RecurRank —CascadeLinUCB —TopRank

6

Page 13: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

Thank you!

6

Page 14: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

References i

Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, and ZhengWen.Dcm bandits: Learning to rank with multiple clicks.In International Conference on Machine Learning, pages1215–1224, 2016.Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan.

Cascading bandits: Learning to rank in the cascade model.In International Conference on Machine Learning, pages 767–776,2015.Paul Lagrée, Claire Vernade, and Olivier Cappe.Multiple-play bandits in the position-based model.In Advances in Neural Information Processing Systems, pages1597–1605, 2016.

7

Page 15: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

References ii

Tor Lattimore, Branislav Kveton, Shuai Li, and Csaba Szepesvari.Toprank: A practical algorithm for online stochastic ranking.In The Conference on Neural Information Processing Systems,2018.Shuai Li, Tor Lattimore, and Csaba Szepesvári.Online learning to rank with features.arXiv preprint arXiv:1810.02567, 2018.

Shuai Li, Baoxiang Wang, Shengyu Zhang, and Wei Chen.Contextual combinatorial cascading bandits.In International Conference on Machine Learning, pages1245–1253, 2016.Shuai Li and Shengyu Zhang.Online clustering of contextual cascading bandits.In The AAAI Conference on Artificial Intelligence, 2018.

8

Page 16: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

References iii

Weiwen Liu, Shuai Li, and Shengyu Zhang.Contextual dependent click bandit algorithm for webrecommendation.In International Computing and Combinatorics Conference,pages 39–50. Springer, 2018.

Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh,Branislav Kveton, Csaba Szepesvari, and Zheng Wen.Online learning to rank in stochastic click models.In International Conference on Machine Learning, pages4199–4208, 2017.

9

Page 17: Online Learning to Rank with Features - 2020 Conference12-11-00)-12-12-0… · Online Learning to Rank with Features Authors:ShuaiLi,TorLattimore,CsabaSzepesvári TheChineseUniversityofHongKong

References iv

Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, andBranislav Kveton.Cascading bandits for large-scale recommendation problems.In Proceedings of the Thirty-Second Conference on Uncertainty inArtificial Intelligence, pages 835–844. AUAI Press, 2016.

10