analysis for spoken language translation using phrase-level parsing and domain action classification...

46
Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie Mellon University June 9, 2003

Upload: jewel-little

Post on 31-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

Analysis for Spoken Language Translation Using

Phrase-Level Parsing and Domain Action Classification

Chad Langley

Language Technologies InstituteCarnegie Mellon University

June 9, 2003

Page 2: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 2

Outline

• Interlingua-Based Machine Translation• NESPOLE! MT System Overview• Interchange Format Interlingua• Hybrid Analysis Approach• Evaluation

– Domain Action Classification– End-to-End Translation

• Summary

Page 3: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 3

Interlingua-BasedMachine Translation

Interlingua

Japanese

Arabic

Chinese

EnglishFrench German

Italian

Arabic

Chinese

English

French German

Italian

Japanese

Korean

Analyzers

Generators

Spanish

Page 4: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 4

Interlingua-Based MTat Carnegie Mellon

• Long line of research in interlingua-based machine translation of spontaneous conversational speech– C-STAR I (appointment scheduling)– Enthusiast (passive SpanishEnglish)– C-STAR II (travel planning)– LingWear (wearable tourist assistance)– Babylon (handheld medical assistance)– NESPOLE! (travel & tourism and medical

assistance)

Page 5: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 5

NESPOLE! Overview• Human-to-human speech-to-speech machine

translation over the Internet• Domains:

– Travel & Tourism– Medical Assistance

• Languages:– English – Carnegie Mellon University– German – Universität Karlsruhe– Italian – ITC-irst– French – Université Joseph Fourier

• Additional Partners– AETHRA Telecommunications– APT Trentino Tourism Board

Page 6: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 6

NESPOLE! Architecture

• Mediator connects users to translation server• Language specific servers for each language

exchange Interchange Format to perform translation

Page 7: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 7

NESPOLE! Language Servers

• Analysis Chain: Speech Text IF• Generation Chain: IF Text Speech• Connect source language analysis chain to

target language generation chain to translate

Page 8: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 8

NESPOLE! User Interface

Page 9: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 9

Interchange Format Overview

• Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains

• Captures speaker intention rather than literal meaning

• Abstracts away from language-specific syntax and predicate-argument structure

• Represents utterances as sequences of Semantic Dialogue Units (SDUs)

Page 10: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 10

Interchange Format Representation

• IF representation consists of four parts1. Speaker2. Speech Act3. Concepts4. Arguments

speaker : speech_act +concept* (arguments*)

• Domain Action combines domain-independent speech act and domain-dependent concepts

}Domain Action

Page 11: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 11

Interchange Format Specification

• Defines the sets of speech acts, concepts, and arguments– 72 speech acts + 3 “prefix” speech acts– 144 concepts– 227 top-level arguments

• Defines constraints on how components can be combined– Domain actions are formed compositionally based on

the constraints for combining speech acts and concepts

– Arguments must be licensed by at least one element of the domain action

Page 12: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 12

Example

“Hello. I would like to take a vacation in Val di Fiemme.”

hello i would like to take a vacation in val di fiemme

c:greeting (greeting=hello)

c:give-information+disposition+trip

(disposition=(who=i, desire),

visit-spec=(identifiability=no, vacation),

location=name-val_di_fiemme_area))

ENG: Hello! I want to travel for a vacation at Val di Fiemme.

ITA: Salve. Io vorrei una vacanza in Val di Fiemme.

Page 13: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 13

Why Hybrid Analysis?

• Goal: A portable and robust analyzer for task-oriented IF-based speech-to-speech MT

• Previous IF-based MT systems used full semantic grammars to parse complete DAs– Useful for parsing spoken language in restricted

domains– Difficult to port to new domains

• Continue to use semantic grammars to parse small domain-independent DAs and phrase-level arguments

• Train classifiers to identify DAs

Page 14: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 14

Hybrid Analysis Approach

Use a combination of grammar-based phrase-level parsing and machine learning to produce interlingua (IF) representations

Page 15: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 15

Hybrid Analysis Approachhello i would like to take a vacation in val di fiemmec:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=name-val_di_fiemme_area))

Page 16: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 16

greeting= disposition= visit-spec= location=

hello i would like to take a vacation in val di fiemme

Page 17: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 17

SDU1 SDU2

Page 18: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 18

greeting give-information+disposition+trip

Page 19: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 19

Argument Parsing

• Parse utterances using phrase-level grammars

• SOUP Parser: Stochastic, chart-based, top-down robust parser designed for real-time analysis of spoken language

• Separate grammars based on the type of phrases that the grammar is intended to cover

Page 20: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 20

Grammars

• Argument grammar– Identifies arguments defined in the IFs[arg:activity-spec=]

(*[object-ref=any] *[modifier=good] [biking])

– Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words

• Pseudo-argument grammar– Groups common phrases with similar meanings into

classess[=arrival=] (*is *usually arriving)

– Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms

Page 21: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 21

Grammars

• Cross-domain grammar– Identifies simple domain-independent DAss[greeting]

([greeting=first_meeting] *[greet:to-whom=])

– Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms

• Shared grammar– Contains low-level rules accessible by all

other grammars

Page 22: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 22

Segmentation• Goal: Split utterances into Semantic Dialogue

Units so Domain Actions can be assigned• Potential SDU boundaries occur between

argument parse trees and/or unparsed words• An SDU boundary is present if there is a parse

tree from the cross-domain grammar on either side of a potential boundary position

• Otherwise, use a memory-based classifier to determine if an SDU boundary is present

Page 23: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 23

Segmentation Classifier• The segmentation classifier is a memory-based

classifier implemented using TiMBL• Input: 10 features based on word and parse

information surrounding a potential boundary • Output: Binary decision about presence of

SDU boundary• Training Data: Potential SDU boundaries

extracted from utterances manually annotated with SDU boundaries and parsed with the phrase-level grammars

Page 24: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 24

Segmentation Features• Preceding parse label (A-1)• Probability a boundary follows A-1 (P(A-1•))• Preceding word (w-1)• Probability a boundary follows w-1 (P(w-1•))• Number of words since last boundary• Number of argument parse trees since last

boundary• Following parse label (A1)• Probability a boundary precedes A1 (P(•A1))• Following word (w1)• Probability a boundary precedes w1 (P(•w1))

Page 25: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 25

Segmentation Features

• Probability features are estimated using counts from the training data– P(A-1•) = C(A-1•) / C(A-1)– P(w-1•) = C(w-1•) / C(w-1)– P(•A1) = C(•A1) / C(A1)– P(•w1) = C(•w1) / C(w1)

• 3 segmentation training examples in “hello i would like to take a vacation in val di fiemme”– 1 positive (between “hello” and “i”) – 2 negative (between “to” and “take”; between

“vacation” and “in”)

Page 26: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 26

Evaluation: Segmentation• Data: English and German in Travel & Tourism

and Medical Assistance domains• TiMBL parameters: IB1 (k-NN) algorithm with

Gain Ratio feature weighting, k=5 and unweighted voting

• Evaluated using 20-fold cross validation with “in-turn” examples

English Travel

German Travel

English Medical

German Medical

Accuracy 92.64% 93.04% 96.23% 93.26%

Training Examples 35690 46170 42187 7792

In-Turn Examples 23844 31234 27873 5522

Turn Boundary Examples 11846 14936 14314 2270

Page 27: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 27

Domain Action Classification

• Goal: Identify the DA for each SDU using TiMBL memory-based classifiers

• Split DA classification into two subtasks (Speech Act and Concept Sequence)– Reduces the number of classes for each

classifier– Allows for different approaches and/or

feature sets for each task– Allows for DAs that did not occur in the data

• Also classify the complete DA directly

Page 28: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 28

DA Classification Data English

Travel German Travel

English Medical

German Medical

SDUs 8289 8719 3664 2294 Domain Actions 972 1001 462 286 Speech Acts 70 70 50 43 Concept Sequences 615 638 305 179 Vocabulary Size 1946 2815 1694 1112

Corpus Information

English Travel

German Travel

English Medical

German Medical

DA 19.2% acknowledge

19.7% acknowledge

25.1% give-information+exp+h-s

27.2% acknowledge

SA 41.4% give-information

40.7% give-information

59.7% give-information

35.3% give-information

CS 38.9% No concepts

40.3% No concepts

35.0% +experience+health-status

47.3% No concepts

Most Frequent DAs, SAs, and Concept Sequences

Page 29: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 29

DA Classifiers

• SA, CS, and DA classifiers implemented using TiMBL memory-based learner

• Input: Binary features indicate presence or absence of argument and pseudo-argument labels in the phrase-level parse (200-300 features)– CS classifier also uses the corresponding SA

• Output: Best class (SA, CS, or DA)• Training Data: SDUs manually annotated

with IF representations and parsed with the argument parser

Page 30: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 30

DA Classification

• Data: English and German in Travel & Tourism and Medical Assistance domains

• TiMBL parameters: IB1 (k-NN) algorithm with Gain Ratio feature weighting, k=1

• 20-fold cross validation

English Travel

German Travel

English Medical

German Medical

Speech Act 69.82% 67.57% 77.73% 68.61% Concept Sequence 69.59% 67.08% 64.71% 69.93%

Domain Action (SA + CS) 49.63% 46.50% 51.53% 50.91% Domain Action (direct) 49.69% 46.51% 51.56% 51.18%

Page 31: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 31

Comparison of Learning Approaches

• Learning Approaches– Memory-Based Learning (TiMBL)– Decision Trees (C4.5)– Neural Networks (SNNS)– Naïve Bayes (Rainbow)

• Important Considerations– Accuracy– Speed of training and classification– Accommodation of discrete and continuous features

from multiple sources– Production of ranked list of classes– Online server mode

Page 32: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 32

Comparison of Learning Approaches

• 20-fold cross validation setup• All classifiers used same feature set

(grammar labels)• SNNS may perform slightly better but

prefer TiMBL when all factors are taken into account

English German TiMBL 69.82% 67.57% C4.5 70.41% 67.90% SNNS 71.52% 67.61% Rainbow 51.39% 46.00%

Speech Act classifier accuracy.

English German TiMBL 49.69% 46.51% C4.5 48.90% 46.58% SNNS 49.39% 46.21% Rainbow 39.74% 38.32% Domain Action classifier accuracy.

English German TiMBL 69.59% 67.08% C4.5 68.47% 66.45% SNNS 71.35% 68.67% Rainbow 51.64% 51.50%

Concept Sequence classifier accuracy.

Page 33: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 33

Adding Word Information

• Grammar label unigrams do not exploit the strengths of naïve Bayes classification

• Test naïve Bayes classifiers (Rainbow) trained on word bigrams

English German Domain Action 48.59% 48.09% Speech Act 79.00% 77.46% Concept Sequence 56.87% 57.77%

Rainbow accuracy with word bigrams.

• Words provide useful information for the task, especially for Speech Act classification

Page 34: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 34

Adding Word Information• Add word-based features to the TiMBL

classifiers1. Binary features for the top 250 words sorted by

mutual information2. Probabilities computed by Rainbow

English German TiMBL + words 78.59% 75.98% TiMBL + Rainbow 81.25% 78.93%

Words+Parse SA classifier accuracy.

English German TiMBL + words 56.48% 54.98%

Word+Parse DA classifier accuracy.

English German TiMBL SA + TiMBL CS

49.63% 46.50%

TiMBL+Rainbow SA + TiMBL CS

57.74% 53.93%

DA accuracy of SA+CS classifiers.

Page 35: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 35

Using the IF Specification

• Use knowledge from the IF specification during DA classification– Ensure that only legal DAs are produced– Guarantee that the DA and arguments

combine to form a valid IF representation

• Strategy: Find the best DA that licenses the most arguments– Trust parser to reliably label arguments– Retaining detailed argument information is

important for translation

Page 36: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 36

Using the IF Specification

• Check if the best speech act and concept sequence form a legal DA

• If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities

• Select the best combination that licenses the most arguments

• Drop arguments not licensed by the best DA

Page 37: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 37

Evaluation:IF Specification Fallback

• Test set contained 292 SDUs from 151 utterances• 182 SDUs required classification• 4% had illegal DAs• 29% had illegal IFs• Mean arguments per SDU: 1.47

Changed

Speech Act 5%

Concept Sequence 26%

Domain Action 29%

Arguments dropped per SDU

Without fallback 0.38

With fallback 0.07

Page 38: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 38

End-to-End Translation• Speech input through text output

– Reflects combined performance of speech recognition, analysis, and generation

• Travel & Tourism domain• English-to-English and English-to-Italian

– Test Set: 232 SDUs (110 utterances) from 2 unseen dialogues

• German-to-German and German-to-Italian– Test Set: 356 SDUs (246 utterances) from 2 unseen

dialogues

• Analyzer used Segmentation, Speech Act, and Concept Sequence classifiers with IF specification fallback strategy

Page 39: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 39

End-to-End Translation

• Each SDU graded by 3 human graders as very good, good, bad, or very bad

• Acceptable = very good + good• Unacceptable = bad + very bad• Majority vote among 3 graders (i.e. A

translation was considered acceptable if it received at least 2 Acceptable grades)

• Speech recognition hypotheses were also graded as if they were paraphrases produced by the translation system

Page 40: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 40

End-to-End Translation (Travel & Tourism)

English WAR

German WAR

56.4% 51.0%

Speech Recognition Word Accuracy Rates

English Output

Italian Output

SR Hypotheses 66.7% --

Translation from SR Hypotheses

50.4% 50.2%

Acceptable end-to-end translation for English travel input

German Output

Italian Output

SR Hypotheses 61.6% --

Translation from SR Hypotheses

53.4% 51.7%

Acceptable end-to-end translation for German travel input

Page 41: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 41

Work in Progress• Evaluation of end-to-end translation for

medical assistance domain• Evaluation of portability from the Travel

& Tourism domain to the Medical Assistance domain

• Data ablation studies

Page 42: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 42

Summary• I described an effective method for identifying

domain actions that combines phrase-level parsing and machine learning.

• The hybrid analysis approach is fully integrated into the NESPOLE! English and German MT systems.

• Automatic classification of domain actions is feasible despite the large number of classes and relatively sparse unevenly distributed data– <10000 training examples– Most frequent classes have >1000 examples– Many classes have only 1-2 examples

Page 43: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 43

Summary• Word and argument information can be

effectively combined to improve domain action classification performance.

• Preliminary indications are that the approach is quite portable.– English and German NESPOLE! systems

were ported from Travel & Tourism to Medical Assistance.

• Annotation: ~125 person hours• Grammar Development: ~140 person hours

Page 44: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 44

Page 45: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 45

Hybrid Analysis Approachhello i would like to take a vacation in val di fiemmec:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=name-val_di_fiemme_area))

SDU1 SDU2

greeting= disposition= visit-spec= location=

hello i would like to take a vacation in val di fiemme

greeting give-information+disposition+trip

Page 46: Analysis for Spoken Language Translation Using Phrase-Level Parsing and Domain Action Classification Chad Langley Language Technologies Institute Carnegie

June 9, 2003 Chad Langley 46

DA Classification DataCumulative Coverage of 100 Most Frequent

DAs, SAs, and CSs (English Travel Data)

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

1 11 21 31 41 51 61 71 81 91

DA

SA

CS