user simulationfordialoguesystems · §the seq2one is slightly better than seq2seq because it‘s...
TRANSCRIPT
![Page 1: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/1.jpg)
www.hhu.de
USER SIMULATION FOR DIALOGUE SYSTEMSHsien-Chin Lin, 22 Nov 2019
![Page 2: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/2.jpg)
www.hhu.de
Why do we need a simulated user (SU)?
2
Dialogue system
Natural LanguageUnderstanding Belief Tracking
Policy AgentNatural LanguageGeneration
environment
rewardagent
![Page 3: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/3.jpg)
www.hhu.de
Why do we need a simulated user (SU)?
§ RL need lots of interaction to learn the policy
§ Learning from real user§ costly
§ time-consuming
§ Learning from data§ collecting interactable data is not easy
§ Learning from SU
3
For training
![Page 4: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/4.jpg)
www.hhu.de
Why do we need a simulated user (SU)?
§ Human evaluation§ costly and time-consuming§ hard to reproduce
§ Automatic evaluation§ success rate, rewards, ...
§ NLG metrics: not consistant with human evaluation
§ Evaluating by SU is easy to reproduce, cross-model comparison
4
For evaluation
![Page 5: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/5.jpg)
www.hhu.de
Different kinds of user simulation
§ Granularity§ Semantic level§ Natural Language level§ template, retrieval, generation
§ Methodology§ n-gram: Bi-gram, graph model, bayesian model, HMM, ...
§ rule-based: agenda-based
§ data driven: Seq2Seq, inverse RL, adversarial model, ...
5
Summarize SU in different aspects
![Page 6: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/6.jpg)
www.hhu.de
Previous studies
§ N-gram
§ Graph based§ Agenda based
6
non-DL approaches
![Page 7: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/7.jpg)
www.hhu.de
Previous studies
§ Bi-gram model 𝑃 𝑎# 𝑎$§ only looks on the latest system action§ cannot produce coherent user behavious
§ the SU may produce illogical behaviour if the user goal changes
§ Look longer history§ incorporate user goal into user state § HMM (Cuayáhuitl et al. 2005), Baysian model (Pietquin and Dutoit 2009)...
7
N-grams SU (Eckert et al. 1997)
![Page 8: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/8.jpg)
www.hhu.de
Previous studies
§ All possible paths in a network
§ Need extensive domain knowledge
§ Not practicable for complex domain
8
Graph-based SU (Scheffler and Young, 2000)
![Page 9: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/9.jpg)
www.hhu.de
§ user state 𝑆 is described as an agenda 𝐴 and a goal 𝐺§ Example:
§ The probabilities can be learned from corpus or set manually
Rule-based SU
9
Agenda-based approach (Schatzmann et al. 2007)
![Page 10: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/10.jpg)
www.hhu.de
Summary of these models
§ Inability to take dialogue history
§ Rigid structure to ensure coherent user behavior§ Need lots of labor effort for designing rules§ Domain dependent
10
These models suffer from...
![Page 11: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/11.jpg)
www.hhu.de
Data-driven SU
§ Semantic to Semantic
§ Combined agenda-base with seq2seq§ Semantic to Utterence§ Hierarchical seq2seq§ comparison of different settings
11
Seq2Seq models
![Page 12: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/12.jpg)
www.hhu.de
Seq2Seq SU
§ uniform select a goal 𝐺 = (𝐶, 𝑅)§ 𝐶: constraints, food-type, price range, ...§ 𝑅: requests, name, address, ...
§ context 𝑐/ concatenated with§ 𝑎$,/: recent machine acts§ 𝑖𝑛𝑐𝑜𝑛𝑠𝑖𝑠𝑡/: inconsistency§ 𝑐𝑜𝑛𝑠𝑡/: constraints status§ 𝑟𝑒𝑞/: requests status
12
semantic level (El Asri et al., 2016)
![Page 13: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/13.jpg)
www.hhu.de
Seq2Seq SU
13
Example of the context vector
![Page 14: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/14.jpg)
www.hhu.de
Seq2Seq SU
§ Dataset: DSTC2, DSTC3
§ Baseline§ Bi-gram, agenda-based
§ Sequence-to-one:outputs a probability distribution over a predefined set of compound acts (size: 54)
§ Measurement
§ F-score, i.e. 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = # ;< =;>>?=/@A B>?CD=/?C CDE@;F E=/G# ;< B>?CD=/?C CDE@;F E=/G
14
Experiment
![Page 15: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/15.jpg)
www.hhu.de
Seq2Seq SU
§ Average F-score on 50 runs
§ The Seq2One is slightly better than Seq2Seq because it‘s an easier task§ The Seq2Seq has better scalability (the number of possible acts might grow)§ The recall is relatively low on larger actions space (54 in DSTC2, 94 in DSTC3)
15
Result
![Page 16: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/16.jpg)
www.hhu.de
Seq2Seq SU
§ Use the agenda-based model for planning
§ If the dialog act can be found in templates then use templates§ Else use Seq2Seq model for NLG
16
Combined agenda-based model with Seq2Seq model (Xiujun Li et al. 2017)
![Page 17: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/17.jpg)
www.hhu.de
Seq2Seq SU
§ System structure§ The setting of Goal Generator and Feature
Extractor is like (El Asri et al., 2016)
§ The input sequence is Feature History
§ The output seqence is User Utterance
17
Semantic to Utterance (Kreyssig et al. 2018)
![Page 18: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/18.jpg)
www.hhu.de
Seq2Seq SU
§ Beam-search is often used to generate a sequence by RNNs
§ Taking n beams with the highest probability 𝑃(𝑤/𝑤/IJ …𝑤L|𝒑)
§ Sample 𝑛 words per beam from the probability distribution
18
Generate non-deterministic result
![Page 19: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/19.jpg)
www.hhu.de
Seq2Seq SU
§ The policy trained with NUS can perform well on both SUs
§ Overfitting: the policy performing best on the NUS was not the one on the ABUS
19
Experiments – Cross-Model Evaluation
![Page 20: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/20.jpg)
www.hhu.de
Seq2Seq SU
§ In five seeds for NUS, the performance is all better with less data
§ This behavior was not observed for the policies trained with the ABUS
20
Experiments – Cross-Model Evaluation
![Page 21: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/21.jpg)
www.hhu.de
Seq2Seq SU
§ The NUS performs better
§ The overfitting is also observed, the best performing policy was the policy that performed best on the other US
21
Experiments – human Evaluation
![Page 22: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/22.jpg)
www.hhu.de
Seq2Seq SU
§ Less labelling for generate natural language compared with semantic response
§ NUS excelled on both evaluation tasks
22
Discussion
![Page 23: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/23.jpg)
www.hhu.de
Seq2Seq SU
§ An end-to-end hierarichical seq2seq approach
§ Without any feature extraction and external state tracking annotations§ Encode user goal: ℎP = 𝐸𝑛𝑐(𝑒P; 𝜃P)§ Encode system turn: ℎDT = 𝐸𝑛𝑐(𝑒TU; 𝜃T)§ Encode dialogue historyℎLV = ℎP
ℎDV = 𝐸𝑛𝑐( ℎDT DWJ; 𝜃V)
§ 𝐿=>;GG?Y/: cross-entropy error betweencandidate and correct user sequence
23
Hierarchical User Simulator (HUS) (Gür et al. 2018)
![Page 24: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/24.jpg)
www.hhu.de
Seq2Seq SU
§ The output of HUS is deteministic
§ Add a Gaussian distribution generator§ Sample z[~𝑁 𝑧 𝜇[, Σ[𝜇[ = 𝑊bℎ/IJV + 𝑏bΣ[ = 𝑊eℎ/IJV + 𝑏e
§ The decoder will be initialized with fℎ/V = 𝐹𝐶 ℎ/V; 𝑧[§ KL divergence between prior and posterior distribution
𝐿hE> = 𝛼𝐾𝐿 𝑁 𝑧 𝜇[, Σ[)|𝑁 𝑧 𝜇A, ΣA)in order to make sure the behavior will be consistent
24
Variational HUS (VHUS)
![Page 25: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/25.jpg)
www.hhu.de
Seq2Seq SU
§ Generating long dialogues when user turns diverge from the initial user goal
§ Initialize the history encoder with zero, then fℎ/V = 𝐹𝐶 ℎ/V; ℎ=
§ Minimize the divergence between user goal and user turn token
25
Goal Reqularization (VHUSReg)
𝐿>?F = | 𝑏/# − 𝐵𝑂𝑊 𝐶 | + ||𝑏/V − 𝐵𝑂𝑊(𝑈/)|| + | 𝑏/T − 𝐵𝑂𝑊 𝑆/ |
![Page 26: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/26.jpg)
www.hhu.de
Seq2Seq SU
§ SL§ Supervised end-to-end policy
§ Map user utterence to system actions
§ RL policy outperformed SL§ Especially on EM, the SL may stuck in
local minima and cannot recover some of the slot-value pairs
§ RL is more robust, even with weaker SU
26
Experiment results
![Page 27: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/27.jpg)
www.hhu.de
Seq2Seq SU
§ The dialogue is tranfered to natural language by template
§ All SUs get better score and less standard deviation
27
Human evaluation
![Page 28: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/28.jpg)
www.hhu.de
Seq2Seq SU
§ Compare different settings§ Policy: agenda-based and model-based§ NLG: template, retrieval, and generation
§ Evaluation: direct and indirect
28
Comparison between different settings (Shi et al. 2019)
![Page 29: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/29.jpg)
www.hhu.de
Seq2Seq SU
§ Use perplexity, vocabulary size and utterence length to measure NLG quality
§ Retrieval-based models have the largest Vocab§ Retrieval-based model can generate the longest sentences, but End-to-End
model is also doing good§ Although the PPL is the largest for retrieval-based models, it also has the biggest
Vocab and longest utterence length
29
Automatic direct evaluation
![Page 30: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/30.jpg)
www.hhu.de
Seq2Seq SU
§ Fluency: Templates. They are written by human
§ Coherence: Agenda-based in general better than model-based§ Goal adherence: Infusing the goal is more difficult for End2End.§ Diversity: Retrieval-based is good at diversity but is not as good in fluency
Template-based outperformed on fluency but suffer from diversityGeneration-based suffer from generic responses
30
Human direct evaluation
![Page 31: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/31.jpg)
www.hhu.de
Seq2Seq SU
§ Model-based converge faster. Capture the major path instead of exploring all the possible paths
§ Retrieval-based converged slower because of larger vocabulary size
31
Automatic indirect evaluation
![Page 32: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/32.jpg)
www.hhu.de
Seq2Seq SU
§ The system can handle more language variations will do better on Solved ratio
§ The efficiency doesn’t always correlated to the dialog length (AgenG and SLE)§ The satisfaction is not only related to solved ration but also efficiency and latency§ Naturalness is related to solved ratio (overall performance)
32
Human indirect evaluations
![Page 33: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/33.jpg)
www.hhu.de
Seq2Seq SU
§ Agenda-based with retrieval-based NLG has the best performanceThis result agrees with the human evaluation
§ More type of SU will give better quality of evaluationUser SLT prefers SLT (0.975) than AgenG (0.965), but in overall AgenG is better
§ The diagnal is usuall the highest. RL policy is not general over all kind of users
33
Cross model evaluation
![Page 34: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/34.jpg)
www.hhu.de
Seq2Seq
§ Model-based perform relatively worse
§ Model-based doesn’t explor all possible paths (Act6)
34
Discussion
![Page 35: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/35.jpg)
www.hhu.de
Seq2Seq SU
§ The generating model may suffer from generating generatic results
§ We can get better policy with more diverse output SU§ The policy of SU need to explore all possiblities
35
Summary
![Page 36: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/36.jpg)
www.hhu.de
Inverse RL
§ The SU can be view as an MDP {𝑆, 𝐴, 𝑃, 𝛾}/𝑅
§ Reward function 𝑅s 𝑠, 𝑎 = 𝜃t𝜙 𝑠, 𝑎 = ∑DWJw 𝜃D𝜙D 𝑠, 𝑎
§ Q-function 𝑄y 𝑠, 𝑎 = 𝐸 ∑DWLz 𝛾D𝑟D|𝑠L = 𝑠, 𝑎L = 𝑎
§ 𝑄y 𝑠, 𝑎 = 𝐸 ∑DWLz 𝛾D𝜃t𝜙 𝑠, 𝑎 |𝑠L = 𝑠, 𝑎L = 𝑎 = 𝜃t𝜇y 𝑠, 𝑎§ 𝜇y 𝑠, 𝑎 feature expectation can be model as the discounted measure of
features accorrding to system visitation frequency, given 𝑚 trajectories (H} is the length of the 𝑖/~ trajectorie), 𝜇y 𝑠, 𝑎 can be modeled as:
𝜇y 𝑠, 𝑎 =1𝑚�
DWL
$
�/WL
�U
𝛾D𝜙 𝑠/D, 𝑎/D
36
Inverse RL (Chardramohan et al., 2011)
![Page 37: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/37.jpg)
www.hhu.de
IRL
37
Algorithm
![Page 38: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/38.jpg)
www.hhu.de
IRL
§ We can train a MDP SU from a fix corpus
§ In the paper, they only conducted a simple experiment§ The cost of computing is a lot. (RL in the inner-loop)
38
Summary
![Page 39: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/39.jpg)
www.hhu.de
Collaboration SU
§ Collaboration-based SU utilizes the similarity between different users to predict the user’s next action
§ Label propagation: train a simple classification model on a part of the data to label the entire dataset
§ Easy to incorporate external knowledge, e.g. user profile to pre-filter the act candidates
§ Can be run very fast
39
Collaboration-based (Didericksen et al. 2017)
![Page 40: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/40.jpg)
www.hhu.de
Machine to Machine
§ Build a dialogue system by M2M and crowdsourcing
§ Collect daya by Wizard-of-Oz setup may suffer from§ Not cover all the interactions
§ Unfitting dialogues (too simplistic or too convoluted)
§ Need more efforts to filter errors
40
Build a Conversational Agent Overnight (Shah et al. 2018)
![Page 41: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/41.jpg)
www.hhu.de
Machine to Machine
§ Outlines are easier to generate
§ Don’t need to generate complex and diverse language
41
Generating outline via self-play
![Page 42: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/42.jpg)
www.hhu.de
Conclusion
ü More controllable
ü Generate all possible paths
- Domain-dependent
- Not scalable
- Labor-consuming
42
ü Learn user behaviour from corpus
ü Less labor effort
ü Adapt to new domain easilier
- Focus on main paths, not all
- Incoherence goal
The rule-based methods The model-based methods
![Page 43: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/43.jpg)
www.hhu.de
Conclution
§ Generate more various outputs and more humain-like behaviour
§ Persona for SU§ Error models: ASR, ambiguity, ... § How to use IRL, adversarial training for SU?§ Self-training via Machine-to-machine interaction
43
What’s next?
![Page 44: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/44.jpg)
www.hhu.de
Reference
§ User modeling for spoken dialogue system evaluationEckert, Wieland, Esther Levin, and Roberto Pieraccini, 1997
§ HUMAN-COMPUTER DIALOGUE SIMULATION USING HIDDEN MARKOV MODELSHeriberto Cuayáhuitl, Steve Renals, Oliver Lemon and Hiroshi Shimodaira. 2005
§ Training Bayesian networks for realistic man-machine spoken dialogue simulationOlivier Pietquin, Stéphane Rossignol, and Michel Ianotto, 2009
§ Probabilistic simulation of human-machine dialoguesScheffler, Konrad, and Steve Young, 2000
§ Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue SystemJost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye and Steve Young, 2007
§ A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue SystemsLayla El Asri, Jing He, Kaheer Suleman, 2016
44
![Page 45: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/45.jpg)
www.hhu.de
Reference
§ A User Simulator for Task-Completion DialoguesXiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen, 2017
§ Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue SystemsKreyssig F, Casanueva I, Budzianowski P, Gašić M, 2018
§ USER MODELING FOR TASK ORIENTED DIALOGUESIzzeddin Gur, Dilek Hakkani-Tur, Gokhan Tur, Pararth Shah, 2018
§ How to Build User Simulators to Train RL-based Dialog SystemsWeiyan Shi, Kun Qian, Xuewei Wang, Zhou Yu, 2019
§ User Simulation in Dialogue Systems using Inverse Reinforcement LearningSenthilkumar Chandramohan, Matthieu Geist, Fabrice Lefèvre, Olivier Pietquin, 2011
§ Collaboration-based User Simulation for Goal-oriented Dialog SystemsDevin Didericksen, Oleg Rokhlenko, Kevin Small, Li Zhou, Jared Kramer, 2017
45
![Page 46: USER SIMULATIONFORDIALOGUESYSTEMS · §The Seq2One is slightly better than Seq2Seq because it‘s an easier task §The Seq2Seq has better scalability (the number of possible acts](https://reader033.vdocuments.mx/reader033/viewer/2022060523/60532ecd5bc6193539584b88/html5/thumbnails/46.jpg)
www.hhu.de
Reference
§ Building a Conversational Agent Overnight with Dialogue Self-PlayPararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, Larry Heck, 2018
46