statistical learning from dialogues for intelligent assistants

77
DR. YUN-NUNG (VIVIAN) CHEN HTTP://VIVIANCHEN.IDV.TW Statistical Learning from Dialogues for Intelligence Assistants Sorry, I didn’t get that! 1 "SORRY, I DIDN'T GET THAT!" -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Upload: yun-nung-vivian-chen

Post on 15-Apr-2017

637 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Statistical Learning from Dialogues for Intelligent Assistants

1

DR YUN-NUNG (VIVIAN) CHEN H T T P V I V I A N C H E N I D VT W

Statistical Learning from Dialogues for Intelligence Assistants

Sorry I didnrsquot get that

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

2

My Background Yun-Nung (Vivian) Chen 陳縕儂 httpvivianchenidvtw

National Taiwan University

2009

BS

2005

Freshman

2011

MS

2015

PhD

Carnegie Mellon University

spoken dialogue systemlanguage understanding

user modeling

speech summarizationkey term extraction

spoken term detection

Microsoft Research

2016

Postdoc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

3

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

4

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

5

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon AlexaEcho

(2014)

httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho

Facebook M

(2015)

What are Intelligent Assistants

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 2: Statistical Learning from Dialogues for Intelligent Assistants

2

My Background Yun-Nung (Vivian) Chen 陳縕儂 httpvivianchenidvtw

National Taiwan University

2009

BS

2005

Freshman

2011

MS

2015

PhD

Carnegie Mellon University

spoken dialogue systemlanguage understanding

user modeling

speech summarizationkey term extraction

spoken term detection

Microsoft Research

2016

Postdoc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

3

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

4

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

5

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon AlexaEcho

(2014)

httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho

Facebook M

(2015)

What are Intelligent Assistants

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 3: Statistical Learning from Dialogues for Intelligent Assistants

3

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

4

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

5

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon AlexaEcho

(2014)

httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho

Facebook M

(2015)

What are Intelligent Assistants

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 4: Statistical Learning from Dialogues for Intelligent Assistants

4

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

5

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon AlexaEcho

(2014)

httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho

Facebook M

(2015)

What are Intelligent Assistants

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 5: Statistical Learning from Dialogues for Intelligent Assistants

5

Apple Siri

(2011)

Google Now

(2012)

Microsoft Cortana

(2014)

Amazon AlexaEcho

(2014)

httpswwwapplecomiossirihttpswwwgooglecomlandingnowhttpwwwwindowsphonecomen-ushow-towp8cortanameet-cortanahttpwwwamazoncomocecho

Facebook M

(2015)

What are Intelligent Assistants

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 6: Statistical Learning from Dialogues for Intelligent Assistants

6

Why do we need them Daily Life Usage

Weather Schedule Transportation Restaurant Seeking

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 7: Statistical Learning from Dialogues for Intelligent Assistants

7

Why do we need them Get things done

Eg set up alarmreminder take note Easy access to structured data services and apps

Eg find docsphotosrestaurants Assist your daily schedule and routine

Eg commute alerts tofrom work Be more productive in managing your work and personal life

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 8: Statistical Learning from Dialogues for Intelligent Assistants

8

Why do companies care Global Digital Statistics (2015 January)

Global Population

721B

Active Internet Users

301B

Active Social Media Accounts

208B

Active Unique Mobile Users

365B

The more natural and convenient input of the devices evolves towards speech

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 9: Statistical Learning from Dialogues for Intelligent Assistants

9

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquorestaurant suggestionsrdquoldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 10: Statistical Learning from Dialogues for Intelligent Assistants

10

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 11: Statistical Learning from Dialogues for Intelligent Assistants

11

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 12: Statistical Learning from Dialogues for Intelligent Assistants

12

Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions

Spoken dialogue systems are being incorporated into various devices (smart-phones smart TVs in-car navigating system etc)

Good SDSs assist users to organize and access information conveniently

Spoken Dialogue System (SDS)

JARVIS ndash Iron Manrsquos Personal Assistant Baymax ndash Personal Healthcare Companion

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 13: Statistical Learning from Dialogues for Intelligent Assistants

13

Baymax is capable of maintaining a good spoken dialogue system and learning new knowledge for better understanding and interacting with people

What is Baymaxrsquos intelligenceBig Hero 6 -- Video content owned and licensed by Disney Entertainment Marvel Entertainment LLC etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

>

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 14: Statistical Learning from Dialogues for Intelligent Assistants

14

ASR Automatic Speech Recognition SLU Spoken Language Understanding DM Dialogue Management NLG Natural Language Generation

SDS Architecture

DomainDMASR SLU

NLG

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 15: Statistical Learning from Dialogues for Intelligent Assistants

15

Interaction ExampleUser

Intelligent Agent Q How does a dialogue system process this request

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 16: Statistical Learning from Dialogues for Intelligent Assistants

16

SDS Process ndash Available Domain Ontology

find a cheap eating place for taiwanese foodUser

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 17: Statistical Learning from Dialogues for Intelligent Assistants

17

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 18: Statistical Learning from Dialogues for Intelligent Assistants

18

SDS Process ndash Available Domain Ontology

User

target

foodprice AMODNN

seeking PREP_FOR

Organized Domain KnowledgeIntelligent

Agent

Ontology Induction(semantic slot)

Structure Learning(inter-slot relation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 19: Statistical Learning from Dialogues for Intelligent Assistants

19

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 20: Statistical Learning from Dialogues for Intelligent Assistants

20

find a cheap eating place for taiwanese food

SDS Process ndash Spoken Language Understanding (SLU)

User

target

foodprice AMODNN

seeking PREP_FORIntelligent

Agent

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquotaiwaneserdquo

Semantic Decoding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 21: Statistical Learning from Dialogues for Intelligent Assistants

21

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 22: Statistical Learning from Dialogues for Intelligent Assistants

22

find a cheap eating place for taiwanese food

SDS Process ndash Dialogue Management (DM)

User

target

foodprice AMODNN

seeking PREP_FORSELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquoIntelligent

Agent

Surface Form Derivation(natural language)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 23: Statistical Learning from Dialogues for Intelligent Assistants

23

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 24: Statistical Learning from Dialogues for Intelligent Assistants

24

SDS Process ndash Dialogue Management (DM)

User

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Din Tai FungBoiling Point

Predicted intent navigation

Intelligent Agent

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 25: Statistical Learning from Dialogues for Intelligent Assistants

25

SDS Process ndash Natural Language Generation (NLG)

User

Intelligent Agent

Cheap Taiwanese eating places include Din Tai Fung Boiling Point etc What do you want to choose I can help you go there (navigation)

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 26: Statistical Learning from Dialogues for Intelligent Assistants

26

Required Knowledge

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquotaiwaneserdquo

Predicted intent navigation

User

Required Domain-Specific Information

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 27: Statistical Learning from Dialogues for Intelligent Assistants

27

Challenges for SDS An SDS in a new domain requires

1) A hand-crafted domain ontology2) Utterances labelled with semantic representations3) An SLU component for mapping utterances into semantic representations

Manual work results in high cost long duration and poor scalability of system development

The goal is to enable an SDS to 1) automatically infer domain knowledge and then to 2) create the data for SLU modelingin order to handle the open-domain requests

seeking=ldquofindrdquotarget=ldquoeating placerdquoprice=ldquocheaprdquofood=ldquoasian foodrdquo

find a cheap eating place for asian food

fully unsupervised

Prior Focus

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 28: Statistical Learning from Dialogues for Intelligent Assistants

28

Contributions

target

foodprice AMODNN

seeking PREP_FOR

SELECT restaurant restaurantprice=ldquocheaprdquo restaurantfood=ldquoasian foodrdquo

Predicted intent navigation

find a cheap eating place for taiwanese foodUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

(natural language)

(inter-slot relation)

(semantic slot)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 29: Statistical Learning from Dialogues for Intelligent Assistants

29

ContributionsUser

Ontology Induction

Structure Learning

Surface Form Derivation

Semantic Decoding

Intent Prediction

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 30: Statistical Learning from Dialogues for Intelligent Assistants

30

Ontology Induction Structure Learning Surface Form Derivation

Semantic Decoding Intent Prediction

ContributionsUser

Knowledge Acquisition SLU Modeling

find a cheap eating place for taiwanese food

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 31: Statistical Learning from Dialogues for Intelligent Assistants

31

Knowledge Acquisition1) Given unlabelled conversations how can a system automatically

induce and organize domain-specific concepts

Restaurant Asking

Conversations

target

foodprice

seeking

quantity

PREP_FOR

PREP_FOR

NN AMOD

AMODAMOD

Organized Domain Knowledge

Unlabelled Collection

Knowledge Acquisition

Knowledge Acquisition Ontology Induction Structure Learning Surface Form Derivation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 32: Statistical Learning from Dialogues for Intelligent Assistants

32

SLU Modeling2) With the automatically acquired knowledge how can a system

understand utterance semantics and user intents

Organized Domain

Knowledge

price=ldquocheaprdquo target=ldquorestaurantrdquointent=navigation

SLU Modeling

SLU Component

ldquocan i have a cheap restaurantrdquo

SLU Modeling Semantic Decoding Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 33: Statistical Learning from Dialogues for Intelligent Assistants

33

SDS Architecture ndash Contributions

DomainDMASR SLU

NLG

Knowledge Acquisition SLU Modeling

current bottleneck

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 34: Statistical Learning from Dialogues for Intelligent Assistants

34

SDS Flowchart

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 35: Statistical Learning from Dialogues for Intelligent Assistants

35

SDS Flowchart ndash Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 36: Statistical Learning from Dialogues for Intelligent Assistants

36

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 37: Statistical Learning from Dialogues for Intelligent Assistants

37

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 38: Statistical Learning from Dialogues for Intelligent Assistants

38

[Baker et al 1998 Das et al 2014]Frame-Semantic Parsing

FrameNet [Baker et al 1998] a linguistically semantic resource based on the frame-semantics theory wordsphrases can be represented as frames ldquolow fat milkrdquo ldquomilkrdquo evokes the ldquofoodrdquo frame

ldquolow fatrdquo fills the descriptor frame element

SEMAFOR [Das et al 2014] a state-of-the-art frame-semantics parser trained on manually annotated

FrameNet sentences

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 39: Statistical Learning from Dialogues for Intelligent Assistants

39

Ontology Induction [ASRUrsquo13 SLTrsquo14a]

can i have a cheap restaurant

Frame capability

Frame expensiveness

Frame locale by use

1st Issue differentiate domain-specific frames from generic frames for SDSs

GoodGood

Das et al Frame-semantic parsing in Proc of Computational Linguistics 2014

slot candidate

Best Student Paper Award

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 40: Statistical Learning from Dialogues for Intelligent Assistants

40

1

Utterance 1i would like a cheap restaurant Train

hellip hellip

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1 Test

1 97 95

Frame Semantic Parsing

show me a list of cheap restaurantsTest Utterance

Word Observation Slot Candidate

Ontology Induction [ASRUrsquo13 SLTrsquo14a]Best Student Paper Award

Idea increase weights of domain-specific slots and decrease weights of others

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 41: Statistical Learning from Dialogues for Intelligent Assistants

41

1st Issue How to adapt generic slots to a domain-specific setting

Knowledge Graph Propagation Model Assumption domain-specific wordsslots have more dependencies to each other

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot CandidateTrain

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

Slot Induction

Relation matrices allow nodes to propagate scores to their neighbors in the knowledge graph so that domain-specific wordsslots have higher scores after matrix multiplication

i like

1 1

capability

1

locale_by_use

food expensiveness

seeking

relational_quantitydesiring

Utterance 1i would like a cheap restaurant

hellip hellip

find a restaurant with chinese foodUtterance 2

show me a list of cheap restaurantsTest Utterance

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 42: Statistical Learning from Dialogues for Intelligent Assistants

42

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 43: Statistical Learning from Dialogues for Intelligent Assistants

43

Knowledge Graph Construction Syntactic dependency parsing on utterances

ccomp

amoddobjnsubj det

can i have a cheap restaurantcapability expensiveness locale_by_use

Word-based lexical knowledge graph

Slot-based semantic knowledge graph

restaurantcan

have

i

acheap

w

w

capabilitylocale_by_use expensiveness

s

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 44: Statistical Learning from Dialogues for Intelligent Assistants

44

Dependency-based word embeddings

Dependency-based slot embeddings

Edge Weight MeasurementSlotWord Embeddings Training (Levy and Goldberg 2014)

can = have =

expensiveness = capability =

can i have a cheap restaurant

ccomp

amoddobjnsubj det

have acapability expensiveness locale_by_use

ccomp

amoddobjnsubj det

Levy and Goldberg Dependency-Based Word Embeddings in Proc of ACL 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 45: Statistical Learning from Dialogues for Intelligent Assistants

45

Edge Weight Measurement Compute edge weights to represent relation importance

Slot-to-slot semantic relation similarity between slot embeddings Slot-to-slot dependency relation dependency score between slot embeddings Word-to-word semantic relation similarity between word embeddings Word-to-word dependency relation dependency score between word embeddings

+

+

w1

w2

w3

w4

w5

w6

w7

s2

s1 s3

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 46: Statistical Learning from Dialogues for Intelligent Assistants

46

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

Slot Induction

Knowledge Graph Propagation Model119877119908

119878119863

119877119904119878119863

Structure information is integrated to make the self-training data more reliable

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 47: Statistical Learning from Dialogues for Intelligent Assistants

47

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance hidden semantics

2nd Issue unobserved semantics may benefit understanding

Semantic Decoding [ACL-IJCNLPrsquo15]

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 48: Statistical Learning from Dialogues for Intelligent Assistants

48

Reasoning with Matrix Factorization

Word Relation Model Slot Relation Model

word relation matrix

slot relation matrix

times

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test1

1

9790 9585

93 929805 05

Slot Induction

Feature Model + Knowledge Graph Propagation Model

119877119908119878119863

119877119904119878119863

Idea MF completes a partially-missing matrix based on a low-rank latent semantics assumption which is able to model hidden semantics and more robust to noisy data

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 49: Statistical Learning from Dialogues for Intelligent Assistants

49

2nd Issue How to model the unobserved hidden semantics

Matrix Factorization (MF) (Rendle et al 2009)

The decomposed matrices represent latent semantics for utterances and wordsslots respectively

The product of two matrices fills the probability of hidden semantics

1

Word Observation Slot Candidate

Train

cheap restaurant foodexpensiveness

1

locale_by_use

11

1 1

food

1 1

1 Test

1

1

9790 9585

93 929805 05

|119932|

|119934|+|119930|

asymp|119932|times119941 119941times (|119934|+|119930|)times

Rendle et al ldquoBPR Bayesian Personalized Ranking from Implicit Feedback in Proc of UAI 2009

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 50: Statistical Learning from Dialogues for Intelligent Assistants

50

Bayesian Personalized Ranking for MF Model implicit feedback

not treat unobserved facts as negative samples (true or false) give observed facts higher scores than unobserved facts

Objective

1

119891 +iquest iquest119891 minus119891 minus

The objective is to learn a set of well-ranked semantic slots per utterance

119906119909

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 51: Statistical Learning from Dialogues for Intelligent Assistants

51

Ontology Induction

SLUFw Fs

Structure Learning

times

1

Utterance 1i would like a cheap restaurant

Word Observation Slot Candidate

Train

hellip

cheap restaurant foodexpensiveness

1

locale_by_use

11

find a restaurant with chinese foodUtterance 2

1 1

food

1 1

1

Test1 9790 9585

Ontology Induction

show me a list of cheap restaurantsTest Utterance

Matrix Factorization SLU (MF-SLU)

MF-SLU can estimate probabilities for slot candidates given test utterances

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 52: Statistical Learning from Dialogues for Intelligent Assistants

52

Semantic Decoding [ACL-IJCNLPrsquo15]

Input user utterances

Output semantic concepts included in each individual utterance

Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015

SLU Model

target=ldquorestaurantrdquoprice=ldquocheaprdquo

ldquocan I have a cheap restaurantrdquoFrame-Semantic Parsing

Unlabeled Collection

Semantic KG

Ontology InductionFw Fs

Feature Model

Rw

Rs

Knowledge Graph Propagation Model

Word Relation Model

Lexical KG

Slot Relation Model

Structure Learning

times

Semantic KG

MF-SLU SLU Modeling by Matrix Factorization

Semantic Representation

Idea utilize the acquired knowledge to decode utterance semantics (fully unsupervised)

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 53: Statistical Learning from Dialogues for Intelligent Assistants

53

Experimental Setup Dataset Cambridge University SLU Corpus

Restaurant recommendation (WER = 37) 2166 dialogues 15453 utterances dialogue slot addr area food name phone postcode price range task type

Metric MAP of all estimated slot probabilities over all utterancesThe mapping table between induced and reference slots

Henderson et al Discriminative spoken language understanding using word confusion networks in Proc of SLT 2012

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 54: Statistical Learning from Dialogues for Intelligent Assistants

54

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

Approach ASR TranscriptsBaseline

SLUSupport Vector Machine 325 366

Multinomial Logistic Regression 340 388

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 55: Statistical Learning from Dialogues for Intelligent Assistants

55

Experiments of Semantic DecodingQuality of Semantics Estimation

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

The MF-SLU effectively models implicit information to decode semantics

The structure information further improves the results

Approach ASR Transcripts

Baseline SLU

Support Vector Machine 325 366Multinomial Logistic Regression 340 388

Proposed MF-SLU

Feature Model 376 453

Feature Model +Knowledge Graph Propagation

435

(+279)534

(+376)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 56: Statistical Learning from Dialogues for Intelligent Assistants

56

Experiments of Semantic DecodingEffectiveness of Relations

Dataset Cambridge University SLU Corpus

Metric MAP of all estimated slot probabilities for all utterances

In the integrated structure information both semantic and dependency relations are useful for understanding

Approach ASR Transcripts

Feature Model 376 453

Feature + Knowledge Graph Propagation

Semantic 414 516

Dependency 416 490

All 435 (+157) 534 (+179)

the result is significantly better than the MLR with p lt 005 in t-test

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 57: Statistical Learning from Dialogues for Intelligent Assistants

Experiments for Structure LearningRelation Discovery Analysis

Discover inter-slot relations connecting important slot pairs

The reference ontology with the most frequent syntactic dependencies

locale_by_use

food expensiveness

seeking

relational_quantity

PREP_FOR

PREP_FOR

NN AMOD

AMOD

AMODdesiring

DOBJ

type

food pricerange

DOBJ

AMOD AMOD

AMOD

taskarea

PREP_IN

The automatically learned domain ontology aligns well with the reference one

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS 57

The data-driven one is more objective while expert-annotated one is more subjective

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 58: Statistical Learning from Dialogues for Intelligent Assistants

58

Contributions of Semantic Decoding

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Ontology Induction and Structure Learning enable systems to automatically acquire open domain knowledge

MF-SLU for Semantic Decoding is able to1) unify the automatically

acquired knowledge2) adapt to a domain-

specific setting 3) and then allows

systems to model implicit semantics for better understanding

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 59: Statistical Learning from Dialogues for Intelligent Assistants

59

Low- and High-Level Understanding Semantic concepts for individual utterances do not consider high-level semantics (user intents)

The follow-up behaviors usually correspond to user intents

price=ldquocheaprdquo target=ldquorestaurantrdquo

SLU Model

ldquocan i have a cheap restaurantrdquo

intent=navigation

restaurant=ldquolegumerdquo time=ldquotonightrdquo

SLU Model

ldquoi plan to dine in legume tonightrdquo

intent=reservation

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 60: Statistical Learning from Dialogues for Intelligent Assistants

60

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

SDS Flowchart ndash Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 61: Statistical Learning from Dialogues for Intelligent Assistants

61

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 62: Statistical Learning from Dialogues for Intelligent Assistants

62

[Chen amp Rudnicky SLT 2014 Chen et al ICMI 2015]

Input spoken utterances for making requests about launching an app

Output the apps supporting the required functionality

Intent Identification popular domains in Google Play

please dial a phone call to alex

Skype Hangout etc

Intent Prediction of Mobile Apps [SLTrsquo14c]

Chen and Rudnicky Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings in Proc of SLT 2014

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 63: Statistical Learning from Dialogues for Intelligent Assistants

63

Input single-turn request

Output apps that are able to support the required functionality

Intent Prediction ndash Single-Turn Request

1

Enriched Semantics

communication

90

1

1

Utterance 1 i would like to contact alex

Word Observation Intended App

hellip hellip

contact message Gmail Outlook Skypeemail

Test

90

Reasoning with Feature-Enriched MF

Train

hellip your email calendar contactshellip

hellip check and send emails msgs hellip

Outlook

Gmail

IR for app candidates

App Desc

Self-Train Utterance

Test Utterance

1

1

1

1

1

1

1

1 1

1

1 90 85 97 95

FeatureEnrichment

Utterance 1 i would like to contact alexhellip

1

1

The feature-enriched MF-SLU unifies manually written knowledge and automatically inferred semantics to predict high-level intents

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 64: Statistical Learning from Dialogues for Intelligent Assistants

64

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

Challenge language ambiguity1) User preference2) App-level contexts

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

send to vivianvs

Email MessageCommunication

Idea Behavioral patterns in history can help intent prediction

previous turn

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 65: Statistical Learning from Dialogues for Intelligent Assistants

65

Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]

Input multi-turn interaction

Output apps the user plans to launch

1

Lexical Intended Appphoto check camera IMtell

take this phototell vivian this is me in the lab

CAMERA

IMTrainDialogue

check my grades on websitesend an email to professor

hellip

CHROME

EMAIL

send

Behavior History

null camera

85

take a photo of thissend it to alice

CAMERA

IM

hellip

email

1

1

1 1

1

1 70

chrome

1

1

1

1

1

1

chrome email

11

1

1

95

80 55

User UtteranceIntended

App

Reasoning with Feature-Enriched MF

Test Dialogue

take a photo of thissend it to alicehellip

Chen et al Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015 Data Available at httpAppDialoguecom

The feature-enriched MF-SLU leverages behavioral patterns to model contextual information and user preference for better intent prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 66: Statistical Learning from Dialogues for Intelligent Assistants

66

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 261

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 555

LM-Based IR Model (unsupervised)

Multinomial Logistic Regression (supervised)

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 67: Statistical Learning from Dialogues for Intelligent Assistants

67

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)

Modeling hidden semantics helps intent prediction especially for noisy data

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 68: Statistical Learning from Dialogues for Intelligent Assistants

68

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 333Word + Type-Embedding-Based Semantics 315 329

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 566

Semantic enrichment provides rich cues to improve performance

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 69: Statistical Learning from Dialogues for Intelligent Assistants

69

Single-Turn Request Mean Average Precision (MAP)

Multi-Turn Interaction Mean Average Precision (MAP)

Feature MatrixASR Transcripts

LM MF-SLU LM MF-SLUWord Observation 251 292 (+162) 261 304 (+164)Word + Embedding-Based Semantics 320 342 (+68) 333 333 (-02)Word + Type-Embedding-Based Semantics 315 322 (+21) 329 340 (+34)

Feature MatrixASR Transcripts

MLR MF-SLU MLR MF-SLUWord Observation 521 527 (+12) 555 554 (-02)Word + Behavioral Patterns 539 557 (+33) 566 577 (+19)

Intent prediction can benefit from both hidden information and low-level semantics

Experiments for Intent Prediction

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 70: Statistical Learning from Dialogues for Intelligent Assistants

70

OntologyInduction

Structure Learning

Semantic Decoding

Intent Prediction

Knowledge Acquisition

SLU Modeling

Contributions of Intent Prediction Feature-Enriched MF-SLU for

Intent Prediction is able to1) unify the knowledge at

different levels2) learn inference relations

between various features

3) and create personalized models by leveraging contextual behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 71: Statistical Learning from Dialogues for Intelligent Assistants

71

Personal Intelligent Architecture

Reactive Assistance

ASR LU Dialog LG TTS

Proactive Assistance

Inferences User Modeling Suggestions

Data Back-end Data

Bases Services and Client Signals

DeviceService End-points(Phone PC Xbox Web Browser Messaging Apps)

User Experienceldquocall taxirdquo

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 72: Statistical Learning from Dialogues for Intelligent Assistants

72

Outline Intelligent Assistant

What are they Why do we need them Why do companies care

Reactive Assistant ndash Spoken Dialogue System (SDS) Pipeline Architecture Current Challenges amp Overview Contributions

Semantic Decoding

Intent Prediction

Conclusions amp Future Work

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 73: Statistical Learning from Dialogues for Intelligent Assistants

73

Conclusions The work shows the feasibility and the potential for improving generalization maintenance efficiency and scalability of SDSs

The proposed knowledge acquisition procedure enables systems to automatically produce domain-specific ontologies

The proposed MF-SLU unifies the automatically acquired knowledge and then allows systems to consider implicit semantics for better understanding

Better semantic representations for individual utterances Better high-level intent prediction about follow-up behaviors

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 74: Statistical Learning from Dialogues for Intelligent Assistants

74

Future Work Apply the proposed technology to domain discovery

not covered by the current systems but users are interested in guide the next developed domains

Improve the proposed approach by handling the uncertainty

SLUSLUModelingASR Knowledge

Acquisitionrecognition

errorsunreliable knowledge

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 75: Statistical Learning from Dialogues for Intelligent Assistants

75

d d d

U S1 S2

P(S1 | U) P(S2 | U)

hellip

Semantic RelationPosterior Probability

Utterance

Slot Candidate

hellip

w1 w2 wdWord Sequence x

Word Vector lw

Pooling Operation

R(U S1) R(U S2)

Knowledge Graph Propagation Matrix Wp

Semantic Projection Matrix Ws

Semantic Layer y

Knowledge Graph Propagation Layer lp

d

Sn

P(Sn | U)

Utterance Vector lf

hellip

R(U Sn)

Slot Vector lf

Convolution Matrix Wc

Convolutional Layer lc

Towards Unsupervised Deep Learning

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Treating MF as a one-layer neural net we can add more layers in the model towards unsupervised deep learning

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 76: Statistical Learning from Dialogues for Intelligent Assistants

76

Take Home Message Available big data wo annotations

Challenge how to acquire and organize important knowledge and further utilize it for applications

Language understanding for AI

language action understand voice to control music lights etc teach to let friends in by face recognition etc

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

Unsupervised or weakly-supervised methods will be the future trend

Deep language understanding is an emerging field

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A
Page 77: Statistical Learning from Dialogues for Intelligent Assistants

77

Q amp ATHANKS FOR YOUR ATTENTIONS

bull Chen et al Unsupervised Induction and Filling of Semantic Slots for Spoken Dialogue Systems Using Frame-Semantic Parsing in Proc of ASRU 2013 (Best Student Paper Award)

bull Chen et al Jointly Modeling Inter-Slot Relations by Random Walk on Knowledge Graphs for Unsupervised Spoken Language Understanding in Proc of NAACL-HLT 2015bull Chen et al Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding in Proc of ACL-IJCNLP 2015bull Chen et al ldquoDynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddingsrdquo in Proc of SLT 2014bull Chen et al ldquoLeveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding in Proc of ICMI 2015bull Chen et al ldquoMatrix Factorization with Domain Knowledge and Behavioral Patterns for Intent Modeling in Extended Abstract of NIPS-SLU 2015bull Chen et al ldquoUnsupervised User Intent Modeling by Feature-Enriched Matrix Factorization in Proc of ICASSP 2016

SORRY I DIDNT GET THAT -- STATISTICAL LEARNING FROM DIALOGUES FOR INTELLIGENT ASSISTANTS

  • Statistical Learning from Dialogues for Intelligence Assistants
  • My Background
  • Outline
  • Outline (2)
  • What are Intelligent Assistants
  • Why do we need them
  • Why do we need them (2)
  • Why do companies care
  • Personal Intelligent Architecture
  • Personal Intelligent Architecture (2)
  • Outline (3)
  • Spoken Dialogue System (SDS)
  • What is Baymaxrsquos intelligence
  • SDS Architecture
  • Interaction Example
  • SDS Process ndash Available Domain Ontology
  • SDS Process ndash Available Domain Ontology (2)
  • SDS Process ndash Available Domain Ontology (3)
  • SDS Process ndash Spoken Language Understanding (SLU)
  • SDS Process ndash Spoken Language Understanding (SLU) (2)
  • SDS Process ndash Dialogue Management (DM)
  • SDS Process ndash Dialogue Management (DM) (2)
  • SDS Process ndash Dialogue Management (DM) (3)
  • SDS Process ndash Dialogue Management (DM) (4)
  • SDS Process ndash Natural Language Generation (NLG)
  • Required Knowledge
  • Challenges for SDS
  • Contributions
  • Contributions (2)
  • Contributions (3)
  • Knowledge Acquisition
  • SLU Modeling
  • SDS Architecture ndash Contributions
  • SDS Flowchart
  • SDS Flowchart ndash Semantic Decoding
  • Outline (4)
  • Semantic Decoding [ACL-IJCNLPrsquo15]
  • Frame-Semantic Parsing
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a]
  • Ontology Induction [ASRUrsquo13 SLTrsquo14a] (2)
  • 1st Issue How to adapt generic slots to a domain-specific sett
  • Semantic Decoding [ACL-IJCNLPrsquo15] (2)
  • Knowledge Graph Construction
  • Edge Weight Measurement SlotWord Embeddings Training (Levy and
  • Edge Weight Measurement
  • Knowledge Graph Propagation Model
  • Semantic Decoding [ACL-IJCNLPrsquo15] (3)
  • Feature Model + Knowledge Graph Propagation Model
  • 2nd Issue How to model the unobserved hidden semantics Matrix
  • Bayesian Personalized Ranking for MF
  • Matrix Factorization SLU (MF-SLU)
  • Semantic Decoding [ACL-IJCNLPrsquo15] (4)
  • Experimental Setup
  • Experiments of Semantic Decoding Quality of Semantics Estimatio
  • Experiments of Semantic Decoding Quality of Semantics Estimatio (2)
  • Experiments of Semantic Decoding Effectiveness of Relations
  • Experiments for Structure Learning Relation Discovery Analysis
  • Contributions of Semantic Decoding
  • Low- and High-Level Understanding
  • SDS Flowchart ndash Intent Prediction
  • Outline (5)
  • Intent Prediction of Mobile Apps [SLTrsquo14c]
  • Intent Prediction ndash Single-Turn Request
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15]
  • Intent Prediction ndash Multi-Turn Interaction [ICMIrsquo15] (2)
  • Experiments for Intent Prediction
  • Experiments for Intent Prediction (2)
  • Experiments for Intent Prediction (3)
  • Experiments for Intent Prediction (4)
  • Contributions of Intent Prediction
  • Personal Intelligent Architecture (3)
  • Outline (6)
  • Conclusions
  • Future Work
  • Towards Unsupervised Deep Learning
  • Take Home Message
  • Q amp A