transcription system using automatic speech recognition (asr) for the japanese parliament (diet)...

26
Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Upload: beverly-todd

Post on 11-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Transcription System using Automatic Speech Recognition (ASR)

for the Japanese Parliament (Diet)

Tatsuya Kawahara

(Kyoto University, Japan)

Page 2: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Brief Biography

• 1995 Ph. D. (Information Science), Kyoto Univ.• 1995 Associate Professor, Kyoto Univ.• 1995-96 Visiting Researcher, Bell Labs., USA• 2003- Professor, Kyoto Univ.

• 2003-06 IEEE SPS Speech TC member• 2006- Technical Consultant,

The House of Representative, Japan

Published 150~ papers in automatic speech recognition (ASR) and its applications

Web http://www.ar.media.kyoto-u.ac.jp/~kawahara/

Page 3: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Contents

1. Review of ASR technology

2. ASR system for the Japanese Diet

3. Next-generation transcription system of the Japanese Diet

Page 4: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Trend of ASR

style

Informal

Formal

one multipleNumber ofspeakers

FormalFormalpresentationpresentation

Classroom Classroom lectureslectures

Phone Phone conversationconversation Business Business

meetingsmeetings

Reading/Reading/Re-speakingRe-speaking

Broadcast Broadcast newsnews

SpontaneousSpontaneousspeechspeech

ParliamentParliament

Page 5: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Review of ASR technology (1/2)

• Broadcast News [world-wide]– Professional anchors, mostly reading

manuscripts– Accuracy over 90%

• Public speaking, oral presentations [Japan]– Ordinary people making fluent speech– Accuracy ~80% (close-talking mic.)

• Classroom lectures [world-wide]– More informal speaking– Accuracy ~60% (pin mic.)

Page 6: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Review of ASR technology (2/2)

• Telephone conversations [US]– Ordinary people, speaking casually– Accuracy 60%85%

• Business meetings [Europe/US]– Ordinary people, speaking less formally– Accuracy 70% (close mic.), 60% (distant mic.)

• Parliamentary meetings [Europe/Japan] – Politicians speaking formally– EU: plenary sessions: 90%– Japan: committee meetings: 85%

Page 7: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Deployment of ASRin Parliaments & Courts

• Some countries– Steno-mask & Voice writing– Re-speaking Commercial dictation software

• Some local autonomies in Japan– Direct recognition of politicians’ speech

• Japanese Courts– ASR for efficient retrieval from recorded sessions

• Japanese Parliaments (=Diet)– to introduce ASR; direct recognition of politicians’

speech– Mostly in committee meetings

…interactive, spontaneous, sometimes excited

Page 8: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Language-specific Issuesin Japanese

• Need to convert kana (phonetic symbol) to kanji• Conversion ambiguous many homonym

(ex.) KAWAHARA ( カワハラ )  → 河原  (not 川原 )

– Very hard to type-in real-time– Only limited stenographers using special keyboards

can

• Difference in verbatim-style and transcript-style(ex.) おききしたいのですが ききたい(のです)

– Re-speaking is not so simple– need to rephrase in many cases

Page 9: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

ASR Architecture

Signal processing

Acoustic model

Language model

Dictionary

RecognitionEngine

(decoder)

P(W/X) ∝P(W) ・ P(X/W) P(W)

X

P(P/W)

P(W)

P(X/P)P(X/W)

/a, i, u, e, o…/

京都 ky o: t o

京都 + の + 天気

output:W=argmax P(W/X)

Depend on input condition

Depend on application

Page 10: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Current Status of ASR

• Problems unsolved– Spontaneous/conversational speech– Noisy environments

• Including distant microphones

• Solutions ad-hoc– Collect large-scale “matched” data (corpus)

• Same acoustic environment, speakers (10hours~)• Cover same topics, vocabulary (~M words)

– Prepare dedicated acoustic & language models• Huge cost in development & maintenance

Page 11: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Contents

1. Review of ASR technology

2. ASR system for the Japanese Diet

3. Next-generation transcription system of the Japanese Diet

Page 12: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

ASR Research in Kyoto Univ.

• Since 1960s, one of the pioneers

• Development of free software Julius

• Research in spontaneous speech recognition– 1999- Oral presentations– 2001- TV discussions– 2004- Classroom lectures– 2003- Parliamentary meetings

Page 13: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Free ASR Software: Julius

• Developed since 1997 in Kyoto-U & other sites• Open-source

multi-platform (Linux, Mac, Windows, iPhone)

• Open architecture– Independent from acoustic & language models

Ported to many languages

Ported to many applications (telephony, robot…)

• Standard model for Japanese• Widely-used research platform

http://julius.sourceforge.jp

Page 14: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Corpus of Parliamentary Meetings

• Cover all major committees and plenary sessions• 200 hours, 2.4M words• Faithful transcripts of utterances including fillers,

which are aligned with official minutes

{ えー } それでは少し、今 { そのー } 最初に大臣からも、{ そのー } 貯蓄から投資へという流れの中に { ま } 資するんじゃないだろうかとかいうような話もありましたけれども、 { だけど / だけれども } 、 { まあ } あなたが言うと本当にうそらしくなる { んで / ので }{ ですね、えー } もう少し { ですね、あのー } これは { あー } 財務大臣に { えー } お尋ねをしたいんです { が } 。{ ま } その { あの } 見通しはどうかということでありますけれども、これについては、 { あのー } 委員御承知の{ その } 「改革と展望」の中で { ですね } 、我々の今 { あのー } 予測可能な範囲で { えー } 見通せるものについてはかなりはっきりと書かせていただいて ( い ) るつもりでございます。

Page 15: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Cover pronunciationvariations

Cover poor articulation

Cover disfluencies &colloquial expressions

ASR modules oriented forSpontaneous Speech

Signal processing

Acoustic model

Language model

Dictionary

RecognitionEngine

(decoder)

P(W/X) ∝P(W) ・ P(X/W) P(W)

XP(X/W)

Corpus

Innovative techniques

Page 16: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

ASR Performance

• Accuracy– Word accuracy 85% (Character accuracy 8

7% )• Plenary sessions 90%• Committee meetings 80 ~ 87%

– 90% seems almost perfect– No commercial software can achieve!!

• Real-time factor 1-3– Latency in 10 min.

Page 17: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Related Techniques

• Noise suppression & dereverberation– Not serious once matched training data available

• Speaker change detection– Preferred – Current technology level seems not sufficient

• Auto-edit– Filler removal easy– Colloquial expression replacement non-trivial– Period insertion still research stage

Page 18: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Contents

1. Review of ASR technology

2. ASR system for the Japanese Diet

3. Next-generation transcription system of the Japanese Diet

Page 19: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

The House of Representatives in Japan

• 2005: terminated recruiting stenographers

• 2006: investigated ASR technology for the new transcription system

• 2007: developed a prototype system and made preliminary evaluations

• 2008: system design

• 2009: system implementation

• 2010: trial and deployment

Page 20: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

ASR system: Kyoto Univ. model integrated to NTT engine

Signal processing

Acoustic model

Language model

Dictionary

RecognitionEngine

(decoder)

P(W/X) ∝P(W) ・ P(X/W) P(W)

X

P(P/W)

P(W)

P(X/P)P(X/W)

/a, i, u, e, o…/

京都 ky o: t o

京都 + の + 天気

NTTCorp.

Kyoto Univ. House

Page 21: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Issues in Post-Editor

• For efficient correction of ASR errors and cleaning transcript into document-style

• Easy reference to original speech (+video)– by time, by utterance, by character (cursor)– Can speed up & down speech-replay

• Word-processor interface (screen editor); not line editor– to concentrate on making correct sentences– Serious misunderstanding between system

developers and stenographers!!

Page 22: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

System Evaluation (@Kyoto)

• Subjects : 18 students• Post-editing ASR outputs is more efficient than typing

from scratch, regardless of the accuracy Those hard for ASR are also hard for human

3456

789

10

50 55 60 65 70 75 80 85 90 95ASR accuracy

edit

time

(min

)

Type from scratch

Post-edit ASR output

Page 23: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

System Evaluation (@Kyoto)• Subjective evaluation correlates with ASR accuracy• Threshold in 75% to have ASR preferred

1

2

3

4

5

6

7

50 55 60 65 70 75 80 85 90 95

ASR accuracy

Usa

bilit

y sc

ore

of A

SR

Page 24: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

System Evaluation (@House)

• Subjects: 8 stenographers

• System: proto-type

• ASR-based system reduced the edit time, compared with current short-hand system– 78 min. 68 min. (for 5 min. segment)

• Threshold in ASR accuracy of 80%– 75% degradation in edit time;

a half say negative in using ASR

Page 25: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Side effect of ASR-based system

• Everything (text/speech/video) digitized and hyper-linked Efficient search & retrieval

• Less burden? may work on longer segments??

• Significantly less special training needed compared with current short-hand system

Page 26: Transcription System using Automatic Speech Recognition (ASR) for the Japanese Parliament (Diet) Tatsuya Kawahara (Kyoto University, Japan)

Conclusions

• ASR of parliamentary meetings is feasible, given a large collection of data– ~100 hour speech– ~1G word text (minutes)– Accuracy 85-90%

• Effective post-processing is still under investigation

• Automatic translation research is also ongoing