speech-to-speech mt in nespole! design and engineering alon lavie, lori levin work with: chad...

34
Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay Peterson, Kornel Laskowski MT Class, April 2, 2003

Upload: julianna-lang

Post on 05-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

Speech-to-Speech MT in NESPOLE!

Design and EngineeringAlon Lavie, Lori Levin

Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

Peterson, Kornel Laskowski

MT Class, April 2, 2003

Page 2: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

• Speech-to-speech translation for E-Commerce applications

• Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS, AETHRA, APT-Trentino

• Builds on successful collaboration within C-STAR• Improved limited-domain speech translation• Experiment with multimodality and with MEMT• Showcase-1: Travel and Tourism in Trentino, completed

in Nov-2001, demonstrated IST,HLT• Showcase-2: expanded travel + medical service

Page 3: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 3

NESPOLE! System Overview

• Human-to-human spoken language translation for e-commerce application (e.g. travel & tourism) (Lavie et al., 2002)

• English, German, Italian, and French• Translation via interlingua• Translation servers for each language

exchange interlingua to perform translation– Speech recognition (Speech Text)– Analysis (Text Interlingua)– Generation (Interlingua Text)– Synthesis (Text Speech)

Page 4: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 4

Speech-to-speech in E-commerce

• Augment current passive web E-commerce with live interaction capabilities

• Client starts via web, can easily connect to agent for specific detailed information

• “Thin client” - very little special hardware and software on client PC: browser, MS Netmeeting, Shared Whiteboard

Page 5: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 5

NESPOLE! User Interfaces

Page 6: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 6

NESPOLE! Translation Monitor

Page 7: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 7

NESPOLE! Architecture

Page 8: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 8

Distributed S2S Translation over the Internet

Page 9: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 9

Language-specific HLT Servers

Page 10: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 10

Our Parsing and Analysis Approach

• Goal: A portable and robust analyzer for task-oriented human-to-human speech, parsing utterances into interlingua representations

• Our earlier systems used full semantic grammars to parse complete DAs– Useful for parsing spoken language in restricted domains– Difficult to port to new domains

• Current focus is on improving portability to new domains (and new languages)

• Approach: Continue to use semantic grammars to parse domain-independent phrase-level arguments and train classifiers to identify DAs

Page 11: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 11

Interchange Format

• Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains

• Utterances represented as sequences of semantic dialog units (SDUs)

• IF representation consists of four parts– Speaker– Speech Act– Concepts– Arguments

speaker : speech act +concept* +arguments*

}Domain Action

Page 12: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 12

Hybrid Analysis Approach

Text

Argument

Parser

TextArguments

SDU

Segmenter

TextArguments

SDUs

DA

Classifier

IF

Use a combination of grammar-based phrase-level parsing and machine learning to produce interlingua (IF) representations

Page 13: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 13

Hybrid Analysis ApproachHello. I would like to take a vacation in Val di Fiemme.c:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=(place-name=val_di_fiemme))

hello i would like to take a vacation in val di fiemme

SDU1 SDU2

greeting= disposition= visit-spec= location=

hello i would like to take a vacation in val di fiemme

greeting give-information+disposition+trip

greeting= disposition= visit-spec= location=

hello i would like to take a vacation in val di fiemme

Page 14: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 14

Argument Parsing

• Parse utterances using phrase-level grammars• SOUP Parser (Gavaldà, 2000): Stochastic,

chart-based, top-down robust parser designed for real-time analysis of spoken language

• Separate grammars based on the type of phrases that the grammar is intended to cover

Page 15: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 15

Domain Action Classification

• Identify the DA for each SDU using trainable classifiers

• Two TiMBL (k-NN) classifiers– Speech act– Concept sequence

• Binary features indicate presence or absence of arguments and pseudo-arguments

Page 16: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 16

Using the IF Specification

• Use knowledge of the IF specification during DA classification– Ensure that only legal DAs are produced– Guarantee that the DA and arguments

combine to form a valid IF representation

• Strategy: Find the best DA that licenses the most arguments– Trust parser to reliably label arguments– Retaining detailed argument information is

important for translation

Page 17: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 17

Evaluation: Classification Accuracy

• 20-fold cross-validation using the NESPOLE! travel domain database

English German

SDUs 8289 8719

Domain Actions

972 1001

Speech Acts

70 70

Concept Sequences

615 638

Vocabulary 1946 2815

The database: Most Frequent Class:

English German

Speech Act

41.4% 40.7%

Concept Sequence

38.9% 40.3%

Page 18: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 18

Evaluation: Classification Accuracy

English German

Speech Acts

81.25% 78.93%

Concept Sequences

69.59% 67.08%

Classification Performance Accuracy

Page 19: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 19

Evaluation:End-to-End Translation

• English-to-English and English-to-Italian• Training set: ~8000 SDUs from NESPOLE!• Test set: 2 dialogs, only client utterances• Uses IF specification fallback strategy• Three graders, bilingual English/Italian

speakers• Each SDU graded as perfect, ok, bad, very bad• Acceptable translation = perfect+ok• Majority scores

Page 20: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 20

Evaluation:End-to-End Translation

Speech recognizer hypotheses 66.7% WAR: 56.4%

English Source InputTarget

LanguageAcceptable

(OK + Perfect)

Translation from English 68.1%

Human Transcription Italian 69.7%

Translation from English 50.4%

SR Hypothesis Italian 50.2%

Page 21: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 21

Evaluation:Data Ablation Experiment

Classification Accuracy (16-fold Cross Validation)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

500 1000 2000 3000 4000 5000 6009

Training Set Size

Mea

n A

ccu

racy

Speech Act

Concept Sequence

Domain Action

Page 22: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 22

Domain Portability

• Experimented with porting to a medical assistance domain in NESPOLE!

• Initial medical domain system up and running, with reasonable coverage of flu-like symptoms and chest pain

• Porting the interlingua, grammars and modules for English, German and Italian required about 6 person months in total– Interlingua development: ~180 hours– Interlingua annotation: ~200 hours– Analysis grammars, training: ~250 hours– Generation development: ~250 hours

Page 23: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 23

New Development Tools

Page 24: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 24

Questions?

Page 25: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 25

Grammars

• Argument grammar– Identifies arguments defined in the IFs[arg:activity-spec=]

(*[object-ref=any] *[modifier=good] [biking])

– Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words

• Pseudo-argument grammar– Groups common phrases with similar meanings into

classess[=arrival=] (*is *usually arriving)

– Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms

Page 26: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 26

Grammars

• Cross-domain grammar– Identifies simple domain-independent DAss[greeting]

([greeting=first_meeting] *[greet:to-whom=])

– Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms

• Shared grammar– Contains low-level rules accessible by all

other grammars

Page 27: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 27

Segmentation

• Identify SDU boundaries between argument parse trees

• Insert a boundary if either parse tree is from cross-domain grammar

• Otherwise, use a simple statistical model

])C([A ])C([A

])AC([ ])C([A])AF([A

21

2121

Page 28: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 28

Using the IF Specification

• Check if the best speech act and concept sequence form a legal IF

• If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities

• Select the best combination that licenses the most arguments

• Drop any arguments not licensed by the best DA

Page 29: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 29

Grammar Development and Classifier Training

• Four steps1. Write argument grammars2. Parse training data3. Obtain segmentation counts4. Train DA classifiers

• Steps 2-4 are automated to simplify testing new grammars

• Translation servers include a development mode for testing new grammars

Page 30: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 30

Evaluation:IF Specification Fallback

• 182 SDUs required classification• 4% had illegal DAs• 29% had illegal IFs• Mean arguments per SDU: 1.47

Changed

Speech Act 5%

Concept Sequence 26%

Domain Action 29%

Arguments dropped per SDU

Without fallback 0.38

With fallback 0.07

Page 31: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 31

Evaluation:Data Ablation Experiment

• 16-fold cross validation setup• Test set size (# SDUs): 400• Training set sizes (# SDUs): 500, 1000, 2000,

3000, 4000, 5000, 6009 (all data)• Data from previous C-STAR system• No use of IF specification

Page 32: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 32

Future Work

• Alternative segmentation models, feature sets, and classification methods

• Multiple argument parses• Evaluate portability and robustness

– Collect dialogues in a new domain– Create argument and full DA grammars for a small

development set of dialogues– Assess portability by comparing grammar

development times and examining grammar reusability

– Assess robustness by comparing performance on unseen data

Page 33: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 33

References• Cattoni, R., M. Federico, and A. Lavie. 2001. Robust Analysis of Spoken

Input Combining Statistical and Knowledge-Based Information Sources. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Trento, Italy.

• Daelemans, W., J. Zavrel, K. van der Sloot, and A. van den Bosch. 2000. TiMBL: Tilburg Memory Based Learner, version 3.0, Reference Guide. ILK Technical Report 00-01. http://ilk.kub.nl/~ilk/papers/ilk0001.ps.gz

• Gavaldà, M. 2000. SOUP: A Parser for Real-World Spontaneous Speech. In Proceedings of the IWPT-2000, Trento, Italy.

• Gotoh, Y. and S. Renals. Sentence Boundary Detection in Broadcast Speech Transcripts. 2000. In Proceedings on the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium, Paris.

• Lavie, A., F. Metze, F. Pianesi, et al. 2002. Enhancing the Usability and Performance of NESPOLE! – a Real-World Speech-to-Speech Translation System. In Proceedings of HLT-2002, San Diego, CA.

Page 34: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay

April 2, 2003 MT Class 34

References• Lavie, A., C. Langley, A. Waibel, et al. 2001. Architecture and Design

Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications. In Proceedings of HLT-2001, San Diego, CA.

• Lavie, A., D. Gates, N. Coccaro, and L. Levin. 1997. Input Segmentation of Spontaneous Speech in JANUS: a Speech-to-speech Translation System. In Dialogue Processing in Spoken Language Systems: Revised Papers from ECAI-96 Workshop, E. Maier, M. Mast, and S. Luperfoy (eds.), LNCS series, Springer Verlag.

• Lavie, A. 1996. GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language. PhD dissertation, Technical Report CMU-CS-96-126, Carnegie Mellon University, Pittsburgh, PA.

• Munk, M. 1999. Shallow Statistical Parsing for Machine Translation. Diploma Thesis, Karlsruhe University.

• Stevenson, M. and R. Gaizauskas. Experiments on Sentence Boundary Detection. 2000. In Proceedings of ANLP and NAACL-2000, Seattle.

• Woszczyna, M., M. Broadhead, D. Gates, et al. 1998. A Modular Approach to Spoken Language Translation for Large Domains. In Proceedings of AMTA-98, Langhorne, PA.