reinforcementlearning for nlp · examples of rl for nlp. many faces of rl by david silver. what is...
TRANSCRIPT
![Page 1: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/1.jpg)
Reinforcement Learning for NLP
Caiming Xiong
Salesforce ResearchCS224N/Ling284
![Page 2: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/2.jpg)
OutlineIntroduction to Reinforcement Learning
Policy-based Deep RL
Value-based Deep RL
Examples of RL for NLP
![Page 3: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/3.jpg)
Many Faces of RL
By David Silver
![Page 4: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/4.jpg)
What is RL?● RL is a general-purpose framework for sequential decision-making● Usually describe as agent interacting with unknown environment● Goal: select action to maximize a future cumulative reward
AgentEnvironment
Action 𝑎
Reward 𝑟,Observation 𝑜
![Page 5: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/5.jpg)
Motor Control
● Observations: images from camera, joint angle● Actions: joint torques● Rewards: navigate to target location, serve and protect humans
![Page 6: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/6.jpg)
Business Management
● Observations: current inventory levels and sales history● Actions: number of units of each product to purchase● Rewards: future profitSimilarly, there also are resource allocation and routing problems ….
![Page 7: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/7.jpg)
Games
![Page 8: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/8.jpg)
State● Experience is a sequence of observations, actions, rewards
● The state is a summary of experience
![Page 9: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/9.jpg)
RL AgentMajor components:
● Policy: agent’s behavior function● Value function: how good would be each state and/or action● Model: agent’s prediction/representation of the environment
![Page 10: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/10.jpg)
Policy
A function that maps from state to action:
● Deterministic policy:
● Stochastic policy:
![Page 11: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/11.jpg)
Value Function● Q-value function gives expected future total reward
○ from state and action (𝑠, 𝑎)○ under policy 𝜋○ with discount factor 𝛾 ∈ (0,1)
○ Show how good current policy
● Value functions can be defined using Bellman equation
● Bellman backup operator
𝐵𝑄/ 𝑠, 𝑎 =
B/Q s, a = E67,87[r + 𝛾𝑄/(𝑠<, 𝑎<)|𝑠, 𝑎]
![Page 12: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/12.jpg)
Value Function● For optimal Q-value function 𝑄∗ 𝑠, 𝑎 = max
/𝑄/(𝑠, 𝑎) , then policy function is
deterministic, the Bellman equation becomes:
B/Q s, a = E67[r + 𝛾maxB7𝑄/(𝑠<, 𝑎<)|𝑠, 𝑎]
![Page 13: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/13.jpg)
What is Deep RL?● Use deep neural network to approximate
○ Policy○ Value function○ Model
● Optimized by SGD
![Page 14: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/14.jpg)
Approaches
● Policy-based Deep RL
● Value-based Deep RL
● Model-based Deep RL
![Page 15: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/15.jpg)
Deep Policy Network● Represent policy by deep neural network that
maxC𝐸B~G(B|C,H)[𝑟(𝑎)|𝜃, 𝑠]
● Ideas: given a bunch of trajectories, ○ Make the good trajectories/action more probable○ Push the actions towards good actions
![Page 16: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/16.jpg)
Policy GradientHow to make high-reward actions more likely:
![Page 17: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/17.jpg)
● Let's 𝑟 𝑎 say that measures how good the sample is.● Moving in the direction of gradient pushes up the probability of the sample, in
proportion to how good it is.
![Page 18: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/18.jpg)
Deep Q-Learning● Represent value function by Q-network
![Page 19: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/19.jpg)
Deep Q-Learning● Optimal Q-values should obey Bellman equation
● Treat right-hand side as target network, given 𝑠, 𝑎, 𝑟, 𝑠< ,optimize MSE loss via SGD:
● Converges to optimal Q using table lookup representation
![Page 20: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/20.jpg)
Deep Q-LearningBut diverges using neural networks due to:
● Correlations between samples● Non-stationary targets
![Page 21: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/21.jpg)
Deep Q-LearningExperience Replay: remove correlations, build data-set from agent's own experience
● Sample experiences from data-set and apply update
● To deal with non-stationarity, target parameters is fixed
![Page 22: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/22.jpg)
Deep Q-Learning in Atari
Network architecture and hyperparameters fixed across all games
By David Silver
![Page 23: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/23.jpg)
By David Silver
![Page 24: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/24.jpg)
Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. BartoSecond Edition, in progressMIT Press, Cambridge, MA, 2017
If you want to know more about RL, suggest to read:
![Page 25: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/25.jpg)
RL in NLP● Article summarization● Question answering● Dialogue generation● Dialogue System● Knowledge-based QA● Machine Translation● Text generation
![Page 26: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/26.jpg)
RL in NLP● Article summarization● Question answering● Dialogue generation● Dialogue System● Knowledge-based QA● Machine Translation● Text generation
![Page 27: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/27.jpg)
Article SummarizationText summarization is the process of automatically generating natural language summaries from an input document while retaining the important points.
• extractive summarization
• abstractive summarization
![Page 28: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/28.jpg)
A Deep Reinforced Model for Abstractive Summarization
Paulus et. al.
Given x = {𝑥L, 𝑥M ,⋯ , 𝑥O} represents the sequence of input (article) tokens, 𝑦 ={𝑦L, 𝑦M,⋯ , 𝑦R}, the sequence of output (summary) tokens
Coping word Generating word
![Page 29: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/29.jpg)
Paulus et. al.
The maximum-likelihood training objective:
Training with teacher forcing algorithm.
A Deep Reinforced Model for Abstractive Summarization
![Page 30: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/30.jpg)
There is discrepancy between training and test performance, because
• exposure bias
• potentially valid summaries
• metric difference
Paulus et. al.
A Deep Reinforced Model for Abstractive Summarization
![Page 31: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/31.jpg)
Paulus et. al.
Using reinforcement learning framework, learn a policy that maximizes a specific discrete metric.
Action: 𝑢T ∈ 𝑐𝑜𝑝𝑦, 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒 and word 𝑦TH
State: hidden states of encoder and previous outputs
Reward: ROUGH score
A Deep Reinforced Model for Abstractive Summarization
Where 𝑝 𝑦TH 𝑦LH,⋯ , 𝑦T[LH , 𝑥 = 𝑝 𝑢T = 𝑐𝑜𝑝𝑦 𝑝 𝑦TH 𝑦LH,⋯ , 𝑦T[LH , 𝑥, 𝑢T = 𝑐𝑜𝑝𝑦+𝑝 𝑢T = 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑝(𝑦TH|𝑦LH,⋯ , 𝑦T[LH , 𝑥, 𝑢T = 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒)
![Page 32: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/32.jpg)
Paulus et. al.
A Deep Reinforced Model for Abstractive Summarization
![Page 33: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/33.jpg)
Human readability scores on a random subset of the CNN/Daily Mail test dataset
Paulus et. al.
A Deep Reinforced Model for Abstractive Summarization
![Page 34: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/34.jpg)
RL in NLP● Article summarization● Question answering● Dialogue generation● Dialogue System● Knowledge-based QA● Machine Translation● Text generation
![Page 35: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/35.jpg)
Text Question Answering
Example from SQuaD dataset
![Page 36: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/36.jpg)
Text Question Answering
Encoder Layer
Attention Layer
Decoder Pointer
Encoder Layer
Loss function layer
P Q
LSTM, GRU
Self-attentionbiAttentionCoattention
LSTM + MLP GRU + MLP
Cross Entropy
![Page 37: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/37.jpg)
DCN+: MIXED OBJECTIVE AND DEEP RESIDUAL COATTENTION FOR QUESTION ANSWERING
Constraints of Cross-Entropy loss:
P: “Some believe that the Golden State Warriors team of 2017 is one of the greatest teams in NBA history,…”
Q: “which team is considered to be one of the greatest teams in NBA history”
GT: “the Golden State Warriors team of 2017”
Ans1: “Warriors”Ans2: “history”
Xiong et. al.
![Page 38: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/38.jpg)
To address this, we introduce F1 score as extra objective combining with traditional cross entropy loss:
Not necessary for variable length.Xiong et. al.
DCN+: MIXED OBJECTIVE AND DEEP RESIDUAL COATTENTION FOR QUESTION ANSWERING
![Page 39: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/39.jpg)
RL in NLP● Article summarization● Question answering● Dialogue generation● Dialogue System● Knowledge-based QA● Machine Translation● Text generation
![Page 40: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/40.jpg)
Deep Reinforcement Learning for Dialogue Generation
Li et. al.
To generate responses for conversational agents.
The LSTM sequence-to-sequence (SEQ2SEQ) model is one type of neural generation model that maximizes the probability of generating a response given the previous dialogue turn. However,
• One concrete example is that SEQ2SEQ models tend to generate highly generic responses
• stuck in an infinite loop of repetitive responses
![Page 41: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/41.jpg)
Li et. al.
To solve these, the model needs:
• integrate developer-defined rewards that better mimic the true goal of chatbot development
• model the long term influence of a generated response in an ongoing dialogue
Deep Reinforcement Learning for Dialogue Generation
![Page 42: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/42.jpg)
Definitions:
Action: infinite since arbitrary-length sequences can be generated.
State: A state is denoted by the previous two dialogue turns [𝑝\, 𝑞\].
Reward: Ease of answering, Information Flow and Semantic Coherence
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 43: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/43.jpg)
● Ease of answering: avoid utterance with a dull response.
The is a list of dull responses such as “I don’t know what you are talking about”, “I have no idea”, etc.
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
S
![Page 44: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/44.jpg)
● Information Flow: penalize semantic similarity between consecutive turns from the same agent.
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
Where ℎG_ and ℎG_`adenote representations obtained from the encoder for two consecutive turns 𝑝\ and 𝑝\bL
![Page 45: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/45.jpg)
● Semantic Coherence: avoid situations in which the generated replies are highly rewarded but are ungrammatical or not coherent
● The final reward for action a is a weighted sum of the rewards
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 46: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/46.jpg)
● Simulation of two agents taking turns that explore state-action space and learning a policy
○ Supervised learning for Seq2Seq models
○ Mutual Information for pretraining policy model
○ Dialogue Simulation between Two Agents
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 47: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/47.jpg)
● Simulation of two agents taking turns that explore state-action space and learning a policy
○ Supervised learning for Seq2Seq models
○ Mutual Information for pretraining policy model
○ Dialogue Simulation between Two Agents
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 48: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/48.jpg)
● Mutual Information for previous sequence 𝑆 and response 𝑇
● MMI objective
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
𝜆 ∶controls the penalization for generic response
![Page 49: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/49.jpg)
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
Consider 𝑆 as (𝒒𝒊, 𝒑𝒊), 𝑇 as 𝑎, we can have
![Page 50: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/50.jpg)
● Simulation
○ Supervised learning for Seq2Seq models
○ Mutual Information for pretraining policy model
○ Dialogue Simulation between Two Agents
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 51: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/51.jpg)
○ Dialogue Simulation between Two Agents
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
• Using the simulated turns and reward, maximize the expected future reward.
• Training trick: Curriculum Learning
![Page 52: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/52.jpg)
Li et. al.
Deep Reinforcement Learning for Dialogue Generation
![Page 53: ReinforcementLearning for NLP · Examples of RL for NLP. Many Faces of RL By David Silver. What is RL? RL is a general-purpose framework for sequential decision-making Usually describe](https://reader033.vdocuments.mx/reader033/viewer/2022041717/5e4c26de5aa7a902a819cef5/html5/thumbnails/53.jpg)
Summary● The introduction of Reinforcement Learning● Deep Policy Learning● Deep Q-Learning● Applications on NLP
○ Article summarization
○ Question answering
○ Dialogue generation