2007/08/02 call routing shih-hsiang lin. 2 references classifiers –vector-based bell labs[cl 1999]...
TRANSCRIPT
![Page 1: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/1.jpg)
2007/08/02
Call Routing
Shih-Hsiang Lin
![Page 2: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/2.jpg)
2
References
• Classifiers– Vector-based
• [CL 1999] Vector Based Natural Language Call Routing, Bell LabsBell Labs• [ACL’98] Dialogue Management in Vector-Based Call Routing, Bell LabsBell Labs• [ICSLP 2002] Natural Language Call Routing A robust, Self-Organizing Approaches, Bell LabBell Lab
ss
– Maximum Entropy Classifier• [ICASSP 2003] Speech Utterance Classification, MicrosoftMicrosoft• [ICASSP 2006] Speech Utterance Classification Model Training Without Manual Transcriptio
ns, MicrosoftMicrosoft
– Multinomial Model for Keywords• [ICASSP 1999] Automatic Topic Identification for Two-Level Call Routing, BBNBBN• [ICASSP 2002] Unsupervised Training Techniques for Natural Language Call Routing, BBNBBN• [ICSLP 2002] Speech-Enable Natural Language Call Routing: BBN Call Director, BBNBBN
– Boosting• [ML 2000] BoosTexter: A Boosting-based System for Text Categorization, AT&TAT&T• [ASRU 2001] BoosTexter for Text Categorization in Spoken Language Dialogue, AT&TAT&T
– Phonotactic Models• [NACCL 2003] Effective Utterance Classification with Unsupervised Phonotactic models, ATAT
&T Labs&T Labs• [SC 2005] Task Independent Call Routing , East Anglia Univ. , East Anglia Univ.
![Page 3: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/3.jpg)
3
References (cont.)
– N-gram Classifier• [Eurospeech 2003] Discriminative Training of N-gram Classifiers for Speech and Text Routin
g.pdf, MicrosoftMicrosoft
– Multiple Classifier• [ASRU 2001] Natural Language Call Routing: Towards Combination and Boosting of Classifi
ers, Bell LabsBell Labs• [Eurospeech 2005] Exploiting Unlabeled Data using Multiple Classifiers for Improved Natural
Language Call-Routing, IBMIBM
• Improves the Classifier– Discriminative Term Selection
• [ICSLP 2002] Improving Latent Semantic Indexing based Classifier with Information Gain, AvAvaya Labaya Lab
– Discriminative Training• [ICASSP 2001] Simplifying Design Specification for Automatic Training of Robust Natural La
nguage Call Router, Bell LabBell Lab• [ICSLP 2002] Discriminative Training for Call Classification and Routing, Bell LabBell Lab• [SAP 2003] Discriminative Training of Natural Routers, Bell LabBell Lab• [ICASSP 2007] A Discriminative Training Framework Using N-Best Speech Recognition Tran
scriptions and Scores for Spoken Utterance Classification, Microsoft, Microsoft
– Others• [ICASSP 2004] Extending Boosting for Call Classification Using Word Confusion Networks, AA
T&TT&T
![Page 4: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/4.jpg)
4
References (cont.)
• Improves the ASR results• [ICASSP 2007] Improving Automatic Call Classification using Machine Translation, IBMIBM
• [ICCC 2004] Improved LSI-based Natural Language Call Routing using Speech Recognition Confidence Scores, Ohio Univ.Ohio Univ.
• Out-of-domain (OOD) Detection• [ICASSP 2004] Out-of-domain Detection Based on Confidence Measures From Multiple Topi
c Classification, ATRATR
• [ASLP 2007] Out-of-domain Utterance Detection Using Classification Confidences of Multiple Topics, ATRATR
• Other• [SIGCHI 2002] A Comparative Study of Speech in The Call Center: Natural Language Call R
outing vs Touch-Tone Menus, BBNBBN
![Page 5: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/5.jpg)
5
Touch-Tone IVR
![Page 6: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/6.jpg)
6
Architecture of a natural language call router
![Page 7: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/7.jpg)
7
Call Routing
• Callers, having dealt with many IVRs (Interactive Voice Response systems ) that are difficult to use, dislike touchtone IVRs and seek agent assistance at the first opportunity– Because of high agent costs, call center managers continue to seek automat
ion with IVRs
• How may I help you ??– The goal of call-routing is to understand the caller’s request and take the app
ropriate action• Routed to the appropriate destination• Transferred to a human operator• Asked a disambiguation question
• Typically, natural language call routing requires two statistical models• The first performs speech recognition to transcribe what the caller says• The second is the Action Classification (AC) that takes the spoken utterances and
predicts the correct action to fulfill caller’s request– Vector-Space Model, Naïve Bayes Classifier (NBC), Support Vector Machines (SVM),
Boosting, Maximum Entropy (ME), etc.
![Page 8: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/8.jpg)
8
Vector-Based Call Routing
![Page 9: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/9.jpg)
9
Vector-Based Call Routing (cont.)
• Using vector-based information retrieval techniques– Each of destination is represented as a vector in n-dimensional
space– Given a query, a query vector is computed and compared to the
existing document vectors• Those destinations whose vectors are similar to the query vector are
returned
• Three issues must be addressed– Determine the vector representation for each destination– Determine how a caller request will be mapped to the same vector
space for comparison with the destination vectors– Decide how the similarity between the request vector and each
destination vector will be measured
![Page 10: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/10.jpg)
10
Vector-Based Call Routing (cont.)
![Page 11: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/11.jpg)
11
Vector-Based Call Routing (cont.)
Routing Module
![Page 12: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/12.jpg)
12
Vector-Based Call Routing (cont.)
• Morphological Filtering and Stop Word Filtering– Concerned with the semantics of the words present in a document– Morphological processor
• Extract the root form of each word in the corpus
• Reduce singulars, plurals, gerunds and various verb forms to their root forms
– e.g. {service, services, servicing} service
– Stop Word Filtering• Ignore list
– consists of noise words which are common in spontaneous speech and can be removed without altering the meaning of an utterance
– e.g. um, uh and ah
• Stop list– enumerates words that are ubiquitous and therefore do not contribute to
discriminating between destinations– e.g. the, be, for and morning
![Page 13: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/13.jpg)
13
Vector-Based Call Routing (cont.)
• Term Extraction– In order to capture word co-occurrence, n-gram terms are extracted f
rom the filtered texts• A list of n-gram terms and their counts are generated
– When an n-gram term is extracted all of the lower order k-grams where are also extracted
– Thresholds are then applied to the n-gram counts to select as salient term» Unigram : 2, bigrams: 3, trigrams: 3» Resulted in 62 trigrams, 275 bigrams and 420 unigram
• Term-Document Matrix Construction– Construct an mxn term-document frequency matrix A– The matrix A is normalized so that each term vector is of unit length
– Using inverse-document frequency weighting scheme to emphasis term which only occurs in a few documents
nk 1
2/1
≤≤12,
,,
ne dt
dtdt
A
AB
dtdtdt Btd
nBtIDFC ,2,, *log*
d(t) is the number of documents containing the term t
![Page 14: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/14.jpg)
14
Vector-Based Call Routing (cont.)
• Singular Value Decomposition and Vector Representation– To provide a uniform representation of term and document vectors
and to reduce the dimensionality of the document vectors– Representing documents by row vectors in allows us to make
comparisons between documents as well as between documents– For query vector , the representing vector is obtained by multiplying
• Candidate Destination Selection– Using cosine similarity measure
rr SV
rUQ T
∑
≤≤12
≤12
≤1,cos
ni ini i
ni ii
qd
qd
qd
qdqd
However although the raw vector cosine scores give some indication of the closeness of a request to a destination, but the absolute value of closeness does not translate directly into the likelihood for correct routing
q
![Page 15: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/15.jpg)
15
Vector-Based Call Routing (cont.)
– A sigmoid function is applied in this paper• From each call in the training data, for each destination, {cosine value,
routing value} pair is used for finding the parameters of sigmoid function
• Raw cosine scores: 92.2% Sigmoid confidence fitting: 93.5%
• Once we have obtained a confidence value for each destination– Final step in the routing process is to compare the confidence values
to a predetermined threshold • Return those destinations whose confidence values are greater than the
threshold as candidate destinations
bdxadba eqddconf 1/1,,
![Page 16: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/16.jpg)
16
Vector-Based Call Routing (cont.)
• Experimental Evaluation– 389 request and 23 destinations
Performance on Transcriptions Performance on ASR outputs
Vector-based Call Routing yields about 96.7% and 93% classification accuracy for manual transcriptions and ASR outputs, respectively
![Page 17: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/17.jpg)
17
Maximum Entropy Classifier
• Most speech utterance classification systems adopt a data-driven statistical learning approach– Requires manual transcriptions of speech utterances and
annotations of classification destinations for the utterances• Time-consuming and expensive, and have become a bottleneck for the
rapid development of spoken language applications
– In this paper, they investigate classification model training based on automatic word transcriptions
• Maximum Entropy Classifier
AWPCP
AWPWCP
AWPAWCP
ACPC
wC
WC
WC
C
|maxarg|maxarg
||maxarg
|,|maxarg
|maxargˆ
Fif
ii CWfWZ
WCP ,exp1
|
Fif
iiC
CWfWZ ,exp
![Page 18: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/18.jpg)
18
Maximum Entropy Classifier (cont.)
• A straightforward way to train a ME classifier without manual transcriptions is to use ASR transcriptions– Because in-domain transcriptions are no available for language
model (LM) training• The mismatch of language models results in an over 50% increase in
classification error rate– Language Model Adaptation is needed
• One way to adapt the language model involves a small amount of transcribed data– Interpolated with the background language model
• However, this is a supervised adaptation and transcribed data is needed
• An alternative is self-adaptation, or unsupervised adaptation– The speech utterance are first recognized with background LM– The recognized strings are then used to train a domain specific LM– Then, iteratively, the newly interpolated model is use to perform
recognition again
![Page 19: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/19.jpg)
19
Maximum Entropy Classifier (cont.)
• One problem with the self-adaptation mechanism is that the recognition errors are fed back to new language model– e.g. “a b c” is misrecognized “a b d” in the first iteration, then it is alm
ost hopeless to recover from this error in the subsequent iterations
• In this paper, two-fold cross-validation unsupervised sel-adaptations mechanism is used– The training utterances are randomly partitioned into two disjoint set
s A and B– The background LM is used to recognize utterances in A– It is then adapted with the recognized text for the recognition of the u
tterance in B– The text recognized from B is in turn used to adapt the background L
M to recognized again the utterances in A– This process iterates until there is no improvement of classification a
ccuracy on the development set
![Page 20: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/20.jpg)
20
Maximum Entropy Classifier (cont.)
• Experimental Evaluation
Task
Training Set 5798
Development Set 410
Test Set 914
ATIS
Supervised adaptation
![Page 21: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/21.jpg)
21
Maximum Entropy Classifier (cont.)
Unsupervised adaptation
This self-adaptation mechanism reduces the CER by 26% over the baseline, and it outperforms the approach of LM adaptationwith partially transcribed in-domain data
• It is interesting to note that the WERs of the training set are about 30% higher than the corresponding WERs of the test set– This indicates that improvement can be achieved if we correctly addr
ess the error feedback problem in the self-adaptation method
The two-fold cross validation reduces the error feedbackproblem in LM self adaptation
![Page 22: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/22.jpg)
22
BBN Call Director
• BBN call director uses a statistical language model for speech recognition and a statistical topic identification (TID) system to identify the topic from the call– Uses a multinomial model for keywords– Incorporates two difference classifier
• Bayesian classifier
• Log Odds classifier
![Page 23: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/23.jpg)
23
BBN Call Director (cont.)
• A caller’s request can be defined as a sequence or word r = {ri}, where each word . is a keyword set of words and includes a non-keyword symbol. as the set of all system topics
• The PDF of the caller’s request conditioned on topics can be modeled as a multinomial distribution
• The parameters of the multinomial model are trained using ML estimation:
jt
mi wwWr ,,1 W M
NttT ,,1
M
i
rinjij trptrp
1
||
number of times word wi occurs in r
ji twp |
M
ijij
jij
ji
Mn
M
Mn
twp
1
|ˆthe number of unique words that occur in topic tj
![Page 24: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/24.jpg)
24
BBN Call Director (cont.)
• Design of classifier– Bayesian Classifier: maximizing the posterior probability
• The TID system returns a list of topics which have the probability above the rejection threshold
– Log Odds Classifier: maximizing the posterior topic log odds
rp
tptrprtp jj
j|
|
j
j
j
j
j
j
trp
trp
tp
tp
rtp
rtp
|1
|log
1log
|1
|log
Misrouted calls: 28%
![Page 25: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/25.jpg)
25
Naïve Bayes Classifier
• The estimated probability of word in the document given the class the document belong to is as follows
• The class prior probability is computed similarly but without smoothing
• The probability of a class model given a document is calculated by
V
s
D
i ijis
D
i ijitjt
dcPdwNV
dcPdwNcwP
1 1
1
;|,
;|,1ˆ;|
tw
D
dcPcP
D
i ijj
1|
C
rid
k rkidr
id
k jkidj
i
jijij
cwPcP
cwPcP
dP
cdPcPdcP
1 1 ,
1 ,
ˆ;|ˆ|
ˆ;|ˆ|
ˆ|
ˆ;|ˆ|ˆ;|
The class that achieved the highest posterior probability for the test document is selected
![Page 26: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/26.jpg)
26
Naïve Bayes Classifier (cont.)
• Experimental Evaluation
Task
Training Set 27K
Test Set 5644
Target classes 35
Technical assistances
• The results show that as labeled data increases performance improves
![Page 27: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/27.jpg)
27
Boosting
• Boosting is an iterative method for improving the accuracy of any given learning algorithm
• The premise of Boosting is to produce a very accurate prediction rule by combing moderately inaccurate (week) rules– The algorithm operates by learning a weak rule at each iteration
so as to minimize the training error rate
![Page 28: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/28.jpg)
28
Boosting (cont.)
X
i iitt yxhiD1
Normalized factor
T
i tt xhsignxH1
• The output of the final classifier is given below
![Page 29: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/29.jpg)
29
Boosting (cont.)
“Base” refers to AC accuracy obtained on the test data using only limited labeled data
“Rover” refers to AC accuracy results on the test data where classifier are trained using the augmented training material
“Bound” refers to lumping labeled data and unlabeled data with their true labels
Combining the classifiers improved action classification accuracy compared tosingle classifier performance consistently across a wide range of labeled data amounts
![Page 30: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/30.jpg)
30
BoosTexter
• The basic idea of boosting is to build a highly accurate classifier by combining many “weak” or “simple” base classifiers– Each one of which may only be moderately accurate
• The collection of base classifiers is constructed in rounds– On each round t, the base learner is used to generate a base
classifier ht
– Besides supplying the base learner with training data, the boosting algorithm also provides a set of nonnegative weights wt over the training examples
• The weights encode how important it is that ht correctly classify each training example
– to force the base learner to focus on the “hardest” examples
• In this paper, they use confidence-rated classifiers– Classifier h output a real number rather than output -1 or +1
![Page 31: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/31.jpg)
31
BoosTexter (cont.)
![Page 32: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/32.jpg)
32
BoosTexter (cont.)
• The real-valued predictions of the final classifier can be converted into probabilities by passing them through a logistic function
– we can regard the quantity as an estimate of the probability that x belongs to class +1
– For instance, the base classifier might be: “If the word ‘yes’ occurs in the utterance, then predict +1.731, else predict -2.171.”
xfe1
1
![Page 33: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/33.jpg)
33
BoosTexter (cont.)
• Evaluation Results
Task 1 Task 2
Training samples 50 ~ 1600 100 ~ 2675
Test samples 2991 2000
Target classes 15 25
![Page 34: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/34.jpg)
34
Phonotactic Models
• A major bottleneck in building data-driven speech processing applications is the need to manually transcribe training utterance into words– This paper proposed a unsupervised methods for utterance
classification
![Page 35: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/35.jpg)
35
Phonotactic Models (cont.)
• Training procedure is divided into two phases– First, train a phone n-gram model– Second, train a classification model mapping phone strings to action
s• Using BoosTexter Classifier
Iterative procedure
maxNrepeat times
![Page 36: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/36.jpg)
36
Phonotactic Models (cont.)
• Experimental Evaluations
– experimental conditions• The suffixes (M and H) in the condition names refer to whether the
training phases uses inputs produced by machine (M) or human (H)
Task 1 Task 2 Task 3
Training samples 40106 10470 14355
Test samples 9724 5005 5000
Target classes 56 54 93
live English product information order transactions
Task 1 Task 2 Task 3
Phone-based method with short “phrasal” contexts has classification accuracy that is so close to that provided by the longer phrasal contexts of trigram word recognition and word-string classification.
![Page 37: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/37.jpg)
37
Phonotactic Models (cont.)
Task 1 Task 2 Task 3
• The effectiveness of unsupervised training is shown as follows
For all three tasks, unsupervised recognition model training improves both recognition and classification accuracy compared with a simple phone loop.
![Page 38: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/38.jpg)
38
N-gram Classifier
• One-pass Utterance Classification
– Utterance A is assigned a class concurrently with speech decoding that finds the word string
– In a one-pass scenario one builds a recognition network that stacks each of language models in parallel
• Two-pass Utterance Classification
CPCWPWAPWACWC
log|log|logmaxargˆ,ˆ,
AC
W
CwwwP Niii ,,,| 11
CP |
WPWAPW universalW
log|logmaxargˆ
CPCWPACC
log|ˆlogmaxargˆ
11 ,,| Niii wwwP
![Page 39: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/39.jpg)
39
Beta Classifier
• Each topic is represented by a word vocabulary and for each word we compute its probability in the topic and it weight
• A query is routed to the destination j with the highest similarity measure
– Parameters and are estimated on a development corpus to boost the accuracy
– The term βj is the weight assigned to topic Tj
nN wwwW ,,, 211
N
wTwP
wwwTjndestinatio
N
k kjkj
j
Njj
12
1
21
|maxarg
,,,maxargˆ
weight assign to word wk
1 2
J
kkN
t t
jN
t tj
w
w
1 1
1
the number of words in the k-th topic-vocabulary
![Page 40: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/40.jpg)
40
Relevance Feedback Technique
• It is hard for an average user to formulate a “good query”– Aids for good query formulation should be provided to users
Assume represent the original user call the topic number the vector representing the i-th topic the set of relevant topics
Hence, the classifier starts by computing the R best topics of the user-queryBuilds the set and then reformulates the query as follow
and denote interpolation parameter represents how far the new vector should be pushed toward the relevant docs represents how far it should be pushed away from the non-relevant ones
origq
Reljt
jRelit
iorignew tRT
tR
qq 11
21
T
it
Rel
Rel
12
1 2 121
![Page 41: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/41.jpg)
41
Constrained Minimization Technique
• Suppose we have two uncorrelated classifier C1 and C2 which predict the topic t1 and t2 respectively for query q
– When both classifier agree (t1 = t2), the topic result is the same as each of the classifier
– When they disagree, a third classifier is invoked as a arbiter• The third classifier may be explicitly trained on disagreements of the first
two using minimum error training
• And can also make a choice only on a subset of topics– This subset may be computed according to the N-best topics proposed by
each of the first two classifiers or according to a confusion measure
![Page 42: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/42.jpg)
42
Experimental Results
• Experimental Evaluation
Task 1 Task 2
Training Set ??? 7400
Test Set 4000 1000
Target classes 23 15
WER 30% 48.1%
USAA banking OASIS BT
DT: Discriminative TrainingARF: Automatic Relevance Feedback
• The better the initial classifier, the less the improvement from boosting• The reformulation of the user request can help the classifier
![Page 43: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/43.jpg)
43
Experimental Results (cont.)
LI: Linear InterpolationCM_D: according to confusion measureCM_N: according to N-best topics (N=2)
• Experiments show that the combination between these two classifier is a good way to improve the performance
• The performance in BT corpus did not give a significant improvement. The reason disagreement between the first two classifier is quite high on the entire test set, about 65%
![Page 44: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/44.jpg)
44
Discriminative Term Selection
• In the previous LSA (or LSI) based approach, terms are selected based on their occurrence statistics in the training data– Terms selected or discarded in this process may or may not be
salient
• Therefore, term selection is an active research area– A subset of terms can be chosen based on the value of importance
factors• e.g. Information Gain (IG), Mutual Information (MI), -test … etc
• In this paper, the discriminative power of the term is measured by the average entropy variations on the topics when the term is present or absent– Each term is assigned a numeric value that indicated the importance
of the term
2
![Page 45: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/45.jpg)
45
Discriminative Term Selection (cont.)
• The IG score of a term ti is calculated according to the following formulas
The right side of Formula (1) can be calculated as follows
![Page 46: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/46.jpg)
46
Discriminative Term Selection (cont.)
• Experimental Evaluation
– Three sets of comparative experiments were performed• Baseline, Term count approach, IG approach
Task
Training samples 3510 / 1755 / 1404
Test samples 307
Target classes 21
Enterprise call centre
The experimental results indicated that the proposedapproach has the performance advantages in all threes conditions over the other two approaches
![Page 47: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/47.jpg)
47
Improving Automatic Call Classification using Machine Translation
• Utilize the translation model in statistical machine translation (SMT) – To capture the relation between truth and the ASR transcribed text– Model is trained using the human transcribed text and the ASR
transcribed text– The ASR transcribed text is sanitized before feeding the classifier
• The sanitization process is thought of as a translation process– SOURCE: ASR output TARGET: Human transcribed text
• The IBM statistical Translation models are based on the source-channel paradigm of communication theory
Noisy Communication Channel
noisy sentence n(target language)
clean sentence c(source language)
( )
( ) ( )cPcnp
ncpc
c
c
|maxarg=
|maxarg=ˆ
translation model probability language model probability
![Page 48: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/48.jpg)
48
Improving Automatic Call Classification using Machine Translation (cont.)
• This paper used IBM Model 2 to learn the relationship between the clean utterances and the ASR transcribed text– For a clean sentence c of length l, choose the length m of the noisy
sentence from distribution– For each position j=1,2,…,m in the noisy sentence, choose a positio
n aj in the clean sentence from a distribution
– For each word at j=1,2,…,m in the noisy sentence, choose a word cj from the manual transcription according to the distribution
• The probability of generating a clean sentence c=c1c2c3…cm given a noisy input n=n1n2n3…nl is given by
( )lmp |
( ) ( ) ( ) ( )∏ ,,|∑ ||=|1= 0=
m
ji
t
iij lmjapncplmpcnp
( )lmjap j ,,|
( )jj ncp |
Length model Alignment modelLexicon model
![Page 49: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/49.jpg)
49
Improving Automatic Call Classification using Machine Translation (cont.)
![Page 50: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/50.jpg)
50
Improving Automatic Call Classification using Machine Translation (cont.)
![Page 51: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/51.jpg)
51
Improving Automatic Call Classification using Machine Translation (cont.)
• Experimental Evaluation
Task 1 Task 2
Training samples 7636 7848
Test samples 1300 1300
Target classes 36 28
WER 28% 21%
Classifier TF-IDF
Enterprise call centre
The manual training v.s. manual testing gives the best performance Benchmark N-best v.s. N-best yields about 8% improvement
![Page 52: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/52.jpg)
52
Improving Automatic Call Classification using Machine Translation (cont.)
![Page 53: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/53.jpg)
53
OOD Detection
• Definition of Out-of-domain for various system
• Research on OOD detection is limited– Conventional studies have typically focused on using recognition
confidences for rejecting erroneous recognition outputs• There is no discrimination between in-domain utterances that have been
incorrectly recognized and OOD utterances
![Page 54: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/54.jpg)
54
OOD Detection (cont.)
• In this paper, three topic classification schemes are evaluated– Word N-gram, LSA, SVM
• Classification confidence score is calculated as below
N
iijij stCXspXtC
1
|||
![Page 55: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/55.jpg)
55
OOD Detection (cont.)
• The final stage of OOD detection consists of applying an in-domain verification model to the vector of confidence scores generated during topic classification
– The in-domain verification model is trained using only in-domain data– The model is trained by combining GPD and deleted interpolation
(OOD)
domain)-(in |
0
11
otherwise
XtCifXG
M
j jidomainin
XG domainin
![Page 56: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/56.jpg)
56
OOD Detection (cont.)
• Experimental EvaluationSpeech Recognition Performance
• SVM approach can achieve the best performance - A minimum EER of 19.6% was obtained when 3-gram feature were used
![Page 57: 2007/08/02 Call Routing Shih-Hsiang Lin. 2 References Classifiers –Vector-based Bell Labs[CL 1999] Vector Based Natural Language Call Routing, Bell Labs](https://reader034.vdocuments.mx/reader034/viewer/2022051516/56649e9f5503460f94ba1da2/html5/thumbnails/57.jpg)
57
OOD Detection (cont.)
• The baseline system has an ERR of 27.7%• The proposed methods provides an absolute reduction in ERR of 6.5% (21.2%) - It offers comparable performance to the closed evaluation case while being trained with only in-domain data
• The ERR of the proposed system when applied to the ASR results is 22.7% - The small increase in ERR suggest that the system is strongly robust against recognition errors
10-best