advances of deep & reinforcement learning on recommender ... · factorization machine...

Advances ofDeep & Reinforcement Learning

on Recommender Systems

Weinan ZhangShanghai Jiao Tong University

http://wnzhang.net

Jan. 06, 2020 at Tsinghua University

Content• A brief review of recommender system road map

• Deep learning for recommender systems

• Deep reinforcement learning for recommender systems

• Summary

Road Map of Recommendation Technique

2000-2006 Neighborhood based collaborative filtering

2007-2009 Matrix factorization and variants

2010-2015 Factorization machine and variants

2015-2017 Deep neural networks for user behavior prediction

2017-2019 Deep reinforcement learning for decision making

Matrix Factorization Techniques

Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix factorization techniques for recommender systems." Computer 42.8 (2009).

r̂u;i = ¹ + bu + bi + p>u qir̂u;i = ¹ + bu + bi + p>u qi

Globalbias

Userbias

Itembias

User-itemInteraction

Factorization Machine

• Incorporate all possible information for recommender systems• One-hot encoding for each discrete (categorical) field• One real-value feature for each continuous field• All features are with latent factors• A more general regression model

Steffen Rendle. Factorization Machines. ICDM 2010. (10-year Best Paper)http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdfOpen source: http://www.libfm.org/

Factorization Machine is a Neural NetworkA NEW PERSPECTIVE




• Summary

Factorization-machine Neural Networks (FNN)

[Factorization Machine Initialized]

Weinan Zhang et al. Deep Learning over Multi-Field Categorical Data: A Case Study on User Response Prediction. ECIR 2016

But factorization machine is still different from common additive neural networks!

Productoperation

Product Operations as Feature Interactions

Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016

Product-basedNeural Network(PNN)

• Blue Pi nodesare productoperators

Feature 1 Feature 2 Feature N

Embed 1 Embed 2 Embed N

P1 P2 Pi

Embedding Layer

Product Layer

Fully Connected Layers

Prediction

Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016

DeepFM

Huifeng Guo, Ruiming Tang and Xiuqiang He et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. IJCAI 2017.

Attentional Factorization Machines

• Basic idea: reweighting the field-pair interaction by attention network

Jun Xiao et al. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. IJCAI 2017.

element-wise product of two vectors

Field-aware Interaction• In FFM, this interaction is implemented with field-

aware embeddings• Substitute with unified embedding and “field-

aware parameter”

F1 Kernel F2* *

F1 F2

FC layer

Score

• Network-in-Network (NFM, PIN)• Generalize interaction to any

functions with sub-networks

• Kernel Interaction (KFM, KPNN)• Use different kernels to

project interactions separately

Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.

Product-network In Network (PIN)

• We can design various sub-net to explore the interaction pattern between two fields

Feature 1 Feature 2 Feature N

Embed 1 Embed 2 Embed N Embedding Layer

Fully Connected Layers

Prediction

Sub-net 1 Sub-net 2 Sub-net i

F1 F2

FC layer

Hidden State

F1*F2


Public Data Experiment Performance


PIN achieved the best performance on well-recognized benchmarks and Huawei’s private dataset.




• Summary

From PCs to Mobiles

• User will only provide feedback on the recommended items, which depend on the current recommendation algorithms

• Learning from interactions with users

Reinforcement Learning

• At each step t, the agent• Receives observation Ot

• Executes action At• Receives scalar reward Rt

• The environment• Receives action At• Emits observation Ot+1• Emits scalar reward Rt+1

• t increments at environment step

Agent

Environment

Learning from interaction: Given the current situation, what to do next in order to maximize utility?

Deep RL for Recommender Systems

• Methodologies

• Policy-based solutions• Policy gradient• Deep deterministic policy gradient / actor-critic

• Value-based solutions• Deep Q-learning

Policy-based works

• Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

• Deep Reinforcement Learning in Large Discrete Action Spaces. AriXiv 2015.

• Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

• Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

ICT

DeepMind

JD

SJTU

Reinforcement Learning to Rank with Markov Decision Process

• State: the state st is defined as a pair [t, Xt] where Xt is the remaining documents for ranking

• Action: at ∈ A(st) selects a document xm(at) ∈ Xt for the ranking position t+1

• Transition:

• Reward:

• Model how to select the next item in the ranking list as an MDP

Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.

State

Action



REINFORCE Policy Gradient



Ranking accuracies on MQ2007 dataset

Deep Reinforcement Learning in Large Discrete Action Spaces

• Algorithm

Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.

Argmax on KNN

Deep Reinforcement Learning in Large Discrete Action Spaces• Experiment performance

Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.

TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions• There are a large number of candidate items as actions to take

• Cause very large computational complexity• No previous literature on this topic• No previous application with such a setting

• TPGR solution: building hierarchical item structure for sequential decision making

Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions

• Item correlation is based on current policy• Policy (and value function) can be regarded as a table

Action 1 Action 2 Action 3 Action 4State 1 0.1 0.3 0.2 0.4State 2 0.4 0.3 0.1 0.2State 3 0.1 0.1 0.3 0.5State 4 0.4 0.2 0.2 0.2State 5 0.2 0.3 0.3 0.2State 6 0.1 0.1 0.6 0.2

• Based on such a table, we can cluster the items into a hierarchy

Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.

Model-based RL for RecSys• Motivations: model-free deep RL methods

• Consume huge amount of data (low sample efficiency)• Suffer from sparse positive feedback (sparse reward)

Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

Actor Critic• Estimate the value of

an action in different scenarios

• Entrance/detail page

Skip on entrance page

Click on entrance page

Leave on entrance page

Skip on item detail page

Click on item detail page

Leave on item detail page

Build predictive models to estimate user behaviors: skip/click/leave

Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.

Critic network

Deep RL for Recommender Systems

• Methodologies

• Policy-based solutions• Policy gradient• Deep deterministic policy gradient

• Value-based solutions• Deep Q-learning

DQN-based works

• Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

• DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

• Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.

• Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. 2020.

JD

MSR

PolyU

SJTU

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

feed positive input and negative input separately

Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

Problem: how to effectively represent the user state?

Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning

maximizing the difference of Q-values between enemy items

enemy items

Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.

DRN: A Deep Reinforcement Learning Framework for News Recommendation• How to train an RL policy online and offline?

Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.

DRN: A Deep Reinforcement Learning Framework for News Recommendation

• Use user and item features to represent Q(s,a)

Dueling Q-network


Exploration by Dueling Bandit Gradient Descent

• Trial-and-update learning (somewhat like evolutionary search)


DRN: A Deep Reinforcement Learning Framework for News Recommendation


Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream

• Observations: previous interactions

• States: Hidden layer of an LSTM

• Actions: push or not

Haihui Tan et al. Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.

KGQR: Leverage Knowledge Graphs to Better Item & State Representation

Sijin Zhou et al. Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. In submission 2020.

Summary of Current RL Solutions for Rec.

• State: Weak user profile representation• Action: Unable to well handle large-scale discrete

action space• Learning: Off-policy model-free RL to avoid data

bias and user modeling• System: Lack of online experiments or long time

tuning; online/offline learning combination• Data efficiency is quite low

• Modeling user dynamics would be a promising direction• Efficient state/action representation




• Summary

Summary:Road Map of Recommendation Technique

2000-2006 Neighborhood based collaborative filtering

2007-2009 Matrix factorization and variants

2010-2015 Factorization machine and variants

2015-2017 Deep neural networks for user behavior prediction

2017-2019 Deep reinforcement learning for decision making

Design neural nets to automatically capture complex interaction patterns in user-item data

Design RL settings for sequential recommendation decision making; train policies in an effective way

Thank You!Questions?

Dr. Weinan ZhangAssistant ProfessorAPEX Data & Knowledge Management LabJohn Hopcroft Center for Computer ScienceShanghai Jiao Tong University

Know more about me at http://wnzhang.net

advances of deep & reinforcement learning on recommender ... · factorization machine...

Documents