spoken language systems: the unfinished agenda raj reddy school of computer science carnegie mellon...

32
Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire 67MB talk with video clips can be downloaded from http://www.rr.cs.cmu.edu/icslp.zip

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Spoken Language Systems:The Unfinished Agenda

Raj ReddySchool of Computer ScienceCarnegie Mellon University

PittsburghSeptember 21, 2006

The entire 67MB talk with video clips can be downloaded from http://www.rr.cs.cmu.edu/icslp.zip

Page 2: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Speech Language Systems

• Objective: Recognize, interpret, execute and respond to spoken language input to computer

• Background:– ATT, CMU, IBM, and MIT working on the

problem for over 40 years– Other Key Contributors: BBN, Dragon

Systems, Kurzweil, SRI, Japan Inc., Europe Inc.

– Research and Development Level of Effort: About $200 million/year world wide

• Long Term Goal : Make speech the preferred mode of communication to computers

Page 3: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Why Speech Processin Has Been Difficult?

• Too Many Sources of Variability• Noise• Microphones• Speakers• Different Speech Sounds• Different Pronunciations• Non Grammaticality• Imprecision of Language

Page 4: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Why Speech Recognition Has Been Difficult? (Cont)

• And Many Sources of Knowledge– Acoustics – Phonetics and Phonology– Lexical Information– Syntax– Semantics– Context – Task Dependent Knowledge

Page 5: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 6: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 7: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 8: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 9: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 10: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 11: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Land Marks

• Dragon Dictate and Naturally Speaking• IBM Via Voice dictation• Nuance-based Tellme 800 services allow

voice query for directory information, stocks, sports, news, weather, and horoscopes

• Microsoft Speech Server e.g. voice dialing

Page 12: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Need for Interdisciplinary Teams• Signal Processing

– Fourier Transforms, DFT, FFT• Acoustics

– Physics of sounds & speech– Vocal tract model

• Phonetics and Linguistics – Sounds (Acoustic-Phonetics)– Words (Lexicon)– Grammar (Syntax)– Meaning (Semantics)

• Statistics– Probability Theory– Hidden Markov Models– Clustering– Dynamic Programming

• AI and Pattern Recognition– Knowledge Representation

and Search– Approximate Matching– Natural Language

Processing • Human Computer

Interaction– Cognitive Science– Design– Social Networks

• Computer Science– Hardware, Parallel Systems– Algorithms Optimization

Page 13: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

The Unfinished Agenda

• Technical• Application specific• Societal

Page 14: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Technical Challenges

• Unrehearsed Spontaneous Speech• Non Native Speakers of English• Dynamic Learning from Sparse Data

– New Words– New Speakers– New Grammatical Forms – New Languages

• No Silver Bullet on the Horizon!• 50 more years?

– Million times greater computational power, memory and bandwidth?

Page 15: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

One Application Specific Challenge:The Million Book Digital Library Project

Page 16: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

The Grand Challenge of Digital Libraries

Create Access to• All published works online

• Instantly available

• In any language

• Anywhere in the world

• Searchable, browsable, navigable

• By humans and machines

Page 17: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire
Page 18: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

One Step at a Time…

• Million Book DL– Only about 1% of all the world’s books

• Harvard University 12M

• Library of Congress 30M

• OCLC catalog 42M

• All Multilingual Books ~100M

• At the rate of digitization of the last decade it would take a 100 years!

Page 19: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Million Book Project: Issues• Time

– At one page per second (20,000 pages per day shift), it will take 100 years (200 working days per year) to scan a million books of 400 pages each

• Cost– 100M books at US$100 per book would coat $10B

– Even in India and China the cost will be $1B

– The annual cost is currently expected to be close $10M per year with support from US, India and China.

• Selection– Selection of appropriate books for scanning is time

consuming and expensive

Page 20: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Million Book Project: Issues (cont)• Logistics

– Each containers hold 10,000 to 20,000 books. Shipping and handling costs about $10,000

• Meta Data– Accessing and/or creating Meta data requires

professionals trained in Library science

• Optical Character Recognition Technology– Essential for searching, translation and

summarization

– Many languages don’t have OCR

Page 21: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Million Book Project: Status

• 18 Centers in India

• 22 centers in China

• 1 Center in Egypt

• Planned : Australia and Europe

• Over 200,000 books scanned– Over 50,000+ accessible on the web

– Uses 4TB of storage

– 10 TB server at CMU Library

– 500,000 books by the end of 2006

– Capacity to scan a million pages a day

Page 22: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Rig VedaAuthor Pandit Sriram Sharma AcharyaLanguage SanskritSubject PhilosophyPublisher Sanskriti Sansthan BareliYearAbstract Rig Veda is the oldest of the

Vedas. The Rig Veda is the oldest book in Sanskrit or any Indo-European language. Many great Yogis and scholars who have understood the astronomical references in the hymns, date the Rig Veda as before 4000 B.C., perhaps as early as 12,000. Modern western scholars date it around 1500 B.C., though recent archaeological finds in India (like Dwaraka) now appear to require a much earlier date

Page 23: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Elementary Treatise on the Wave-Theory of Light

Author Humphery Lloyd, D.D, D.C.LLanguage English Subject PhysicsPublisher Longmans, Green & CoYear 1873Abstract This book deals with the

various aspects of the wave theory of light. It is a critical work which contains an analytical discussion of the most recent researches in Optics. It presents a clear and connected view of the subject.

Page 24: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Mudalayiram Mulamum Author Periya JeeyarLanguage TamilSubject ReligionPublisher Sri Vaishnava Sampirathaya

Sanjeevikiri SabayaiYear 1909Abstract This volume is written in Tamil.

It provides a detailed account of the origin of Vaishnava and is written by Periya Jeeyar. .

Page 25: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Gulzar-A-BadeshaAuthor Khader Badesha Language UrduSubject LiteraturePublisher Namipress, Chennai Year 1919Abstract Literature

Page 26: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Jawahar Ali JoyviyahAuthor Dr.Ilyas lomas Language ArabicSubject MetrologyPublisher Bakri and IssaYear 1876Abstract It is a book on Metrology, a

study of measurements

Page 27: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Title Structure Des MoleculesAuthor Victor HenriLanguage FrenchSubject ChemistryPublisher Taylor and FrancisYear 1925Abstract This is a unique book that

explicates, in detail, the structure of molecules and touches upon certain specific characteristics of molecules with particular reference to Benzene

Page 28: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Million Book Project: Research Challenges

• Providing Access to Billions everyday– Distributed Cached Servers in every country and

region

• Easy to use interfaces for Billions

• Multilingual Information Retrieval

• Translation

• Summarization

• Reading Assistant using Multi Lingual Speech Synthesis and Translation (e.g. for news paper DL)

Page 29: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Bringing the World Closer:Robust Communication among the People of the World

Page 30: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Vision

• Preservation of minority languages, cultures and heritage• Study of Human Language including

– Translation– Summarization– Speech– Search

• Facilitate the use ICT in languages other than English– In communication among uneducated people of the world– In commerce– Search and access to knowledge across all languages

• Globalization requires cross-border and cross-language communication• Eliminate cultural and social barriers• Language barriers can significantly slow down the economic growth

• Access to rare (and potentially beneficial) knowledge requires eliminating the language divide

Page 31: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

Research Agenda: What we must do

• Create technologies and solutions for overcoming the language barrier

• Create toolkits for rapid acquisition of new language capabilities– Character codes, optical character recognition,

speech recognition, speech synthesis, translation, search engines, text mining, summarization, language tutoring, etc.

• Capture data, information and knowledge from masses

• Make fundamental advances in language processing algorithms, e.g., – Deal with 1000 times more data– Conceptual advance in semantic information

retrieval

Page 32: Spoken Language Systems: The Unfinished Agenda Raj Reddy School of Computer Science Carnegie Mellon University Pittsburgh September 21, 2006 The entire

The Research Plan: How we will do it

• Analogy to Human Genome Project• Meticulous core-science based

fundamentals• Researcher toolkits for known

methodologies• Architecture supporting diversity of

methodologies• Long planning horizon to support

development of novel and radical approaches

• Quantitative evaluation against a standard of steadily accumulating improvements in performance