analysis for spoken language translation using phrase-level parsing and domain action classification...

Analysis for Spoken Language Translation Using

Phrase-Level Parsing and Domain Action Classification

Chad Langley

Language Technologies InstituteCarnegie Mellon University

June 9, 2003

June 9, 2003 Chad Langley 2

Outline

• Interlingua-Based Machine Translation• NESPOLE! MT System Overview• Interchange Format Interlingua• Hybrid Analysis Approach• Evaluation

– Domain Action Classification– End-to-End Translation

• Summary


Interlingua-BasedMachine Translation

Interlingua

Japanese

Arabic

Chinese

EnglishFrench German

Italian

Arabic

Chinese

English

French German

Italian

Japanese

Korean

Analyzers

Generators

Spanish


Interlingua-Based MTat Carnegie Mellon

• Long line of research in interlingua-based machine translation of spontaneous conversational speech– C-STAR I (appointment scheduling)– Enthusiast (passive SpanishEnglish)– C-STAR II (travel planning)– LingWear (wearable tourist assistance)– Babylon (handheld medical assistance)– NESPOLE! (travel & tourism and medical

assistance)


NESPOLE! Overview• Human-to-human speech-to-speech machine

translation over the Internet• Domains:

– Travel & Tourism– Medical Assistance

• Languages:– English – Carnegie Mellon University– German – Universität Karlsruhe– Italian – ITC-irst– French – Université Joseph Fourier

• Additional Partners– AETHRA Telecommunications– APT Trentino Tourism Board


NESPOLE! Architecture

• Mediator connects users to translation server• Language specific servers for each language

exchange Interchange Format to perform translation


NESPOLE! Language Servers

• Analysis Chain: Speech Text IF• Generation Chain: IF Text Speech• Connect source language analysis chain to

target language generation chain to translate


NESPOLE! User Interface


Interchange Format Overview

• Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains

• Captures speaker intention rather than literal meaning

• Abstracts away from language-specific syntax and predicate-argument structure

• Represents utterances as sequences of Semantic Dialogue Units (SDUs)


Interchange Format Representation

• IF representation consists of four parts1. Speaker2. Speech Act3. Concepts4. Arguments

speaker : speech_act +concept* (arguments*)

• Domain Action combines domain-independent speech act and domain-dependent concepts

}Domain Action


Interchange Format Specification

• Defines the sets of speech acts, concepts, and arguments– 72 speech acts + 3 “prefix” speech acts– 144 concepts– 227 top-level arguments

• Defines constraints on how components can be combined– Domain actions are formed compositionally based on

the constraints for combining speech acts and concepts

– Arguments must be licensed by at least one element of the domain action


Example

“Hello. I would like to take a vacation in Val di Fiemme.”

hello i would like to take a vacation in val di fiemme

c:greeting (greeting=hello)

c:give-information+disposition+trip

(disposition=(who=i, desire),

visit-spec=(identifiability=no, vacation),

location=name-val_di_fiemme_area))

ENG: Hello! I want to travel for a vacation at Val di Fiemme.

ITA: Salve. Io vorrei una vacanza in Val di Fiemme.


Why Hybrid Analysis?

• Goal: A portable and robust analyzer for task-oriented IF-based speech-to-speech MT

• Previous IF-based MT systems used full semantic grammars to parse complete DAs– Useful for parsing spoken language in restricted

domains– Difficult to port to new domains

• Continue to use semantic grammars to parse small domain-independent DAs and phrase-level arguments

• Train classifiers to identify DAs


Hybrid Analysis Approach

Use a combination of grammar-based phrase-level parsing and machine learning to produce interlingua (IF) representations


Hybrid Analysis Approachhello i would like to take a vacation in val di fiemmec:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=name-val_di_fiemme_area))


greeting= disposition= visit-spec= location=



SDU1 SDU2


greeting give-information+disposition+trip


Argument Parsing

• Parse utterances using phrase-level grammars

• SOUP Parser: Stochastic, chart-based, top-down robust parser designed for real-time analysis of spoken language

• Separate grammars based on the type of phrases that the grammar is intended to cover


Grammars

• Argument grammar– Identifies arguments defined in the IFs[arg:activity-spec=]

(*[object-ref=any] *[modifier=good] [biking])

– Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words

• Pseudo-argument grammar– Groups common phrases with similar meanings into

classess[=arrival=] (*is *usually arriving)

– Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms


Grammars

• Cross-domain grammar– Identifies simple domain-independent DAss[greeting]

([greeting=first_meeting] *[greet:to-whom=])

– Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms

• Shared grammar– Contains low-level rules accessible by all

other grammars


Segmentation• Goal: Split utterances into Semantic Dialogue

Units so Domain Actions can be assigned• Potential SDU boundaries occur between

argument parse trees and/or unparsed words• An SDU boundary is present if there is a parse

tree from the cross-domain grammar on either side of a potential boundary position

• Otherwise, use a memory-based classifier to determine if an SDU boundary is present


Segmentation Classifier• The segmentation classifier is a memory-based

classifier implemented using TiMBL• Input: 10 features based on word and parse

information surrounding a potential boundary • Output: Binary decision about presence of

SDU boundary• Training Data: Potential SDU boundaries

extracted from utterances manually annotated with SDU boundaries and parsed with the phrase-level grammars


Segmentation Features• Preceding parse label (A-1)• Probability a boundary follows A-1 (P(A-1•))• Preceding word (w-1)• Probability a boundary follows w-1 (P(w-1•))• Number of words since last boundary• Number of argument parse trees since last

boundary• Following parse label (A1)• Probability a boundary precedes A1 (P(•A1))• Following word (w1)• Probability a boundary precedes w1 (P(•w1))


Segmentation Features

• Probability features are estimated using counts from the training data– P(A-1•) = C(A-1•) / C(A-1)– P(w-1•) = C(w-1•) / C(w-1)– P(•A1) = C(•A1) / C(A1)– P(•w1) = C(•w1) / C(w1)

• 3 segmentation training examples in “hello i would like to take a vacation in val di fiemme”– 1 positive (between “hello” and “i”) – 2 negative (between “to” and “take”; between

“vacation” and “in”)


Evaluation: Segmentation• Data: English and German in Travel & Tourism

and Medical Assistance domains• TiMBL parameters: IB1 (k-NN) algorithm with

Gain Ratio feature weighting, k=5 and unweighted voting

• Evaluated using 20-fold cross validation with “in-turn” examples

English Travel

German Travel

English Medical

German Medical

Accuracy 92.64% 93.04% 96.23% 93.26%

Training Examples 35690 46170 42187 7792

In-Turn Examples 23844 31234 27873 5522

Turn Boundary Examples 11846 14936 14314 2270


Domain Action Classification

• Goal: Identify the DA for each SDU using TiMBL memory-based classifiers

• Split DA classification into two subtasks (Speech Act and Concept Sequence)– Reduces the number of classes for each

classifier– Allows for different approaches and/or

feature sets for each task– Allows for DAs that did not occur in the data

• Also classify the complete DA directly


DA Classification Data English

Travel German Travel

English Medical

German Medical

SDUs 8289 8719 3664 2294 Domain Actions 972 1001 462 286 Speech Acts 70 70 50 43 Concept Sequences 615 638 305 179 Vocabulary Size 1946 2815 1694 1112

Corpus Information

English Travel

German Travel

English Medical

German Medical

DA 19.2% acknowledge

19.7% acknowledge

25.1% give-information+exp+h-s

27.2% acknowledge

SA 41.4% give-information

40.7% give-information



CS 38.9% No concepts

40.3% No concepts

35.0% +experience+health-status

47.3% No concepts

Most Frequent DAs, SAs, and Concept Sequences


DA Classifiers

• SA, CS, and DA classifiers implemented using TiMBL memory-based learner

• Input: Binary features indicate presence or absence of argument and pseudo-argument labels in the phrase-level parse (200-300 features)– CS classifier also uses the corresponding SA

• Output: Best class (SA, CS, or DA)• Training Data: SDUs manually annotated

with IF representations and parsed with the argument parser


DA Classification

• Data: English and German in Travel & Tourism and Medical Assistance domains

• TiMBL parameters: IB1 (k-NN) algorithm with Gain Ratio feature weighting, k=1

• 20-fold cross validation

English Travel

German Travel

English Medical

German Medical

Speech Act 69.82% 67.57% 77.73% 68.61% Concept Sequence 69.59% 67.08% 64.71% 69.93%

Domain Action (SA + CS) 49.63% 46.50% 51.53% 50.91% Domain Action (direct) 49.69% 46.51% 51.56% 51.18%


Comparison of Learning Approaches

• Learning Approaches– Memory-Based Learning (TiMBL)– Decision Trees (C4.5)– Neural Networks (SNNS)– Naïve Bayes (Rainbow)

• Important Considerations– Accuracy– Speed of training and classification– Accommodation of discrete and continuous features

from multiple sources– Production of ranked list of classes– Online server mode


Comparison of Learning Approaches

• 20-fold cross validation setup• All classifiers used same feature set

(grammar labels)• SNNS may perform slightly better but

prefer TiMBL when all factors are taken into account

English German TiMBL 69.82% 67.57% C4.5 70.41% 67.90% SNNS 71.52% 67.61% Rainbow 51.39% 46.00%

Speech Act classifier accuracy.

English German TiMBL 49.69% 46.51% C4.5 48.90% 46.58% SNNS 49.39% 46.21% Rainbow 39.74% 38.32% Domain Action classifier accuracy.

English German TiMBL 69.59% 67.08% C4.5 68.47% 66.45% SNNS 71.35% 68.67% Rainbow 51.64% 51.50%

Concept Sequence classifier accuracy.


Adding Word Information

• Grammar label unigrams do not exploit the strengths of naïve Bayes classification

• Test naïve Bayes classifiers (Rainbow) trained on word bigrams

English German Domain Action 48.59% 48.09% Speech Act 79.00% 77.46% Concept Sequence 56.87% 57.77%

Rainbow accuracy with word bigrams.

• Words provide useful information for the task, especially for Speech Act classification


Adding Word Information• Add word-based features to the TiMBL

classifiers1. Binary features for the top 250 words sorted by

mutual information2. Probabilities computed by Rainbow

English German TiMBL + words 78.59% 75.98% TiMBL + Rainbow 81.25% 78.93%

Words+Parse SA classifier accuracy.

English German TiMBL + words 56.48% 54.98%

Word+Parse DA classifier accuracy.

English German TiMBL SA + TiMBL CS

49.63% 46.50%

TiMBL+Rainbow SA + TiMBL CS

57.74% 53.93%

DA accuracy of SA+CS classifiers.


Using the IF Specification

• Use knowledge from the IF specification during DA classification– Ensure that only legal DAs are produced– Guarantee that the DA and arguments

combine to form a valid IF representation

• Strategy: Find the best DA that licenses the most arguments– Trust parser to reliably label arguments– Retaining detailed argument information is

important for translation


Using the IF Specification

• Check if the best speech act and concept sequence form a legal DA

• If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities

• Select the best combination that licenses the most arguments

• Drop arguments not licensed by the best DA


Evaluation:IF Specification Fallback

• Test set contained 292 SDUs from 151 utterances• 182 SDUs required classification• 4% had illegal DAs• 29% had illegal IFs• Mean arguments per SDU: 1.47

Changed

Speech Act 5%

Concept Sequence 26%

Domain Action 29%

Arguments dropped per SDU

Without fallback 0.38

With fallback 0.07


End-to-End Translation• Speech input through text output

– Reflects combined performance of speech recognition, analysis, and generation

• Travel & Tourism domain• English-to-English and English-to-Italian

– Test Set: 232 SDUs (110 utterances) from 2 unseen dialogues

• German-to-German and German-to-Italian– Test Set: 356 SDUs (246 utterances) from 2 unseen

dialogues

• Analyzer used Segmentation, Speech Act, and Concept Sequence classifiers with IF specification fallback strategy


End-to-End Translation

• Each SDU graded by 3 human graders as very good, good, bad, or very bad

• Acceptable = very good + good• Unacceptable = bad + very bad• Majority vote among 3 graders (i.e. A

translation was considered acceptable if it received at least 2 Acceptable grades)

• Speech recognition hypotheses were also graded as if they were paraphrases produced by the translation system


End-to-End Translation (Travel & Tourism)

English WAR

German WAR

56.4% 51.0%

Speech Recognition Word Accuracy Rates

English Output

Italian Output

SR Hypotheses 66.7% --

Translation from SR Hypotheses

50.4% 50.2%

Acceptable end-to-end translation for English travel input

German Output

Italian Output

SR Hypotheses 61.6% --

Translation from SR Hypotheses

53.4% 51.7%

Acceptable end-to-end translation for German travel input


Work in Progress• Evaluation of end-to-end translation for

medical assistance domain• Evaluation of portability from the Travel

& Tourism domain to the Medical Assistance domain

• Data ablation studies


Summary• I described an effective method for identifying

domain actions that combines phrase-level parsing and machine learning.

• The hybrid analysis approach is fully integrated into the NESPOLE! English and German MT systems.

• Automatic classification of domain actions is feasible despite the large number of classes and relatively sparse unevenly distributed data– <10000 training examples– Most frequent classes have >1000 examples– Many classes have only 1-2 examples


Summary• Word and argument information can be

effectively combined to improve domain action classification performance.

• Preliminary indications are that the approach is quite portable.– English and German NESPOLE! systems

were ported from Travel & Tourism to Medical Assistance.

• Annotation: ~125 person hours• Grammar Development: ~140 person hours


Hybrid Analysis Approachhello i would like to take a vacation in val di fiemmec:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=name-val_di_fiemme_area))

SDU1 SDU2

greeting= disposition= visit-spec= location=


greeting give-information+disposition+trip


DA Classification DataCumulative Coverage of 100 Most Frequent

DAs, SAs, and CSs (English Travel Data)

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

1 11 21 31 41 51 61 71 81 91

DA

SA

CS

analysis for spoken language translation using phrase-level parsing and domain action classification...

Documents

based speech

speech machine translation

speech mtprevious

human speech

domainindependent speech

arguments72 speech acts

sets of speech acts

speech text ifgeneration