introduction to machine translation csc 5930 machine translation fall 2012 dr. tom way 1

29
Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Upload: harriet-foster

Post on 30-Dec-2015

234 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Introduction to Machine Translation

CSC 5930 Machine Translation

Fall 2012 Dr. Tom Way

1

Page 2: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

HISTORY OF MACHINE TRANSLATION

2

Page 3: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of Machine Translation(Based on work by John Hutchins, mt-archive.info)

• Before the computer: In the mid 1930s, a French-Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’.

• The pioneers (1947-1954): the first public MT demo was given in 1954 (by IBM and Georgetown University).

• Machine translation was one of the first applications envisioned for computers

3

Page 4: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of MT (2)

4

Warren Weaver, PhD was an American scientist, mathematician, and science administrator. He is widely recognized as one of the pioneers of machine translation, and as an important figure in creating support for science in the United States.

Page 5: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of MT (3)

5

First demonstrated by IBM in 1954 with a basic word-for-word translation system

Page 6: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

6

History of MT (4)

• The decade of optimism (1954-1966) ended with the…

• ALPAC (Automatic Language Processing Advisory Committee) report in 1966: “There is no immediate or predictable prospect of useful machine translation."

Page 7: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of MT (5)

7

The ALPAC (Automatic Language Processing Advisory Committee) was a govt. committee of seven scientists.

Their 1966 report was very skeptical of the progress in computational linguistics and machine translation.

The ALPAC Report

Page 8: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of MT (6)

• The aftermath of the ALPAC report…

• Research on machine translation virtually stopped from 1966 to 1980

8

Page 9: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

History of MT (7)

• Then, a rebirth…

• The 1980s: Interlingua, example-based Machine Translation

• The 1990s: Statistical MT

• The 2000s: Hybrid MT

• The 2010s: Google, real-time, mobile, Crowdsourcing, more hybrid approaches

9

Page 10: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

MACHINE TRANSLATION TODAY

10

Page 11: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Where are we now?

• Huge potential/need due to the internet, globalization and international politics.

• Quick development time due to Statistical Machine Translation (SMT), the availability of parallel data and computers.

• Translation is reasonable for language pairs with a large amount of resources.

• Start to include more “minor” languages.

11

Page 12: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Rule-based MT

12

The Vauquois Triangle

Page 13: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Statistical MT

13

The Rosetta Stone

Page 14: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

What is MT good for?

• Rough translation: web data• Computer-aided human translation• Translation for limited domain• Cross-lingual IR

• Machines beat humans at: – Speed: much faster than humans– Memory: can easily memorize millions of word/phrase

translations.– Manpower: machines are much cheaper than humans– Fast learner: it takes minutes or hours to build a new system.– Never complain, never get tired, …

14

Page 15: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

15

Interest in Machine Translation (1)

• Commercial interest:– U.S. has invested in machine translation (MT)

for intelligence purposes– MT is popular on the web—it is the most used

of Google’s special features– EU spends more than $1 billion on translation

costs each year.– (Semi-)automated translation could lead to

huge savings

Page 16: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

16

Interest in Machine Translation (2)

• Academic interest:– One of the most challenging problems in NLP

research– Requires knowledge from many NLP sub-

areas, e.g., lexical semantics, syntactic parsing, morphological analysis, statistical modeling,…

– Being able to establish links between two languages allows for transferring resources from one language to another

Page 17: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Goals & Uses

• Translating

• Summarizing

• Communicating

• Pre-editing

• Grammar analysis

• Analyzing text

• Understanding text and images

17

Page 18: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

DO WE REALLY NEED MACHINE TRANSLATION?

18

Page 19: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Languages on the Internet

19

Page 20: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Languages on Twitter

20

Page 21: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Languages in Los Angeles

21

Page 22: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why do we need MT?

22

Page 23: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why do we need MT?

23

Page 24: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why do we need MT?

24

Page 25: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why is MT hard?

25

Page 26: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why is MT hard?

26

Page 27: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why is MT hard?

27

Page 28: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

Why is MT hard?

• For example…

• Commercial system “Language Weaver” created in 2002

• Uses statistical techniques from cryptography and machine to acquire statistical models from human translations

• Sold in 2010 for $42.5 million

28

Page 29: Introduction to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way 1

v.2.0 – October 2003

v.2.4 – October 2004v.3.0 - February 2005

“Language Weaver” SMT System – Comparison: Arabic to English