introduction to computational...

32
Introduction to Computational Linguistics PD Dr. Frank Richter (all slides provided by Prof. Dr. Erhard W. Hinrichs) [email protected]. Seminar f ¨ ur Sprachwissenschaft Eberhard-Karls-Universit ¨ at T ¨ ubingen Germany NLP Intro – WS 2005/6 – p.1

Upload: others

Post on 08-Jan-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Introduction to ComputationalLinguisticsPD Dr. Frank Richter

(all slides provided by Prof. Dr. Erhard W. Hinrichs)

[email protected].

Seminar fur Sprachwissenschaft

Eberhard-Karls-Universitat Tubingen

Germany

NLP Intro – WS 2005/6 – p.1

Page 2: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Definition of CL (1a)

Computational linguistics is the scientific studyof language from a computational perspective.

Computational linguists are interested inproviding computational models of various kindsof linguistic phenomena. These models may be"knowledge-based" ("hand-crafted") or "data-driven" ("statistical" or "empirical").

NLP Intro – WS 2005/6 – p.2

Page 3: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Definition of CL (1b)

Work in computational linguistics is in some casesmotivated from a scientific perspective in that oneis trying to provide a computational explanation fora particular linguistic or psycholinguistic phenomenon;and in other cases the motivation may be more purelytechnological in that one wants to provide a workingcomponent of a speech or natural language system.

http://www.aclweb.org/archive/what.html

NLP Intro – WS 2005/6 – p.3

Page 4: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Definition of CL (2)

Computational linguistics is the application of linguistictheories and computational techniques to problems ofnatural language processing.

http://www.ba.umist.ac.uk/public/

departments/registrars/academicoffice/

uga/lang.htm

NLP Intro – WS 2005/6 – p.4

Page 5: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Definition of CL (3)

Computational linguistics is the science of languagewith particular attention given to the processingcomplexity constraints dictated by the human cognitivearchitecture. Like most sciences, computationallinguistics also has engineering applications.

http://www.cs.tcd.ie/courses/csll/

CSLLcourse.html

NLP Intro – WS 2005/6 – p.5

Page 6: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Definition of CL (4)

Computational linguistics is the study of computersystems for understanding and generating naturallanguage.

Ralph Grishman, Computational

Linguistics: An Introduction,

Cambridge University Press 1986.

NLP Intro – WS 2005/6 – p.6

Page 7: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Two Approaches in CL

Rule-Based SystemsExplicit encoding of linguistic knowledgeUsually consisting of a set of hand-crafted,grammatical rulesEasy to test and debugRequire considerable human effortOften based on limited inspection of the data with anemphasis on prototypical examplesOften fail to reach sufficient domain coverageOften lack sufficient robustness when input data arenoisy

NLP Intro – WS 2005/6 – p.7

Page 8: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Two Approaches in CL

Data-Driven SystemsImplicit encoding of linguistic knowledgeOften using statistical methods or machine learningmethodsRequire less human effortAre data-driven and require large-scale data sourcesAchieve coverage directly proportional to therichness of the data sourceAre more adaptive to noisy data

NLP Intro – WS 2005/6 – p.8

Page 9: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Central Goal of the Field

build psychologically adequate models of humanlanguage processing capabilities on the basis ofknowledge about the way in which humans acquire,store, and process language.

build functionally correct models of human languageprocessing capabilities on the basis of knowledge aboutthe world and about language elicited from people andstored in the system.

NLP Intro – WS 2005/6 – p.9

Page 10: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Application Areas

machine translation

speech recognition

speech synthesis

man-machine interfaces

NLP Intro – WS 2005/6 – p.10

Page 11: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Application Areas

intelligent word processing: spelling correction,grammar correction

document managementfind relevant documents in collectionsestablish authorship of documentscatch plagiarismextract information from documentsclassify documentssummarize documentssummarize document collections

NLP Intro – WS 2005/6 – p.11

Page 12: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

A bit of Philosophy of Science

Theory:A set of statements that determine the format andsemantics of descriptions of phenomena in the purviewof the theory

Methodology:An effective theory comes with an explicit methodologyfor acquiring these descriptions

Application:A theory associated with a methodology can be appliedto tasks for which the methodology is appropriate.

NLP Intro – WS 2005/6 – p.12

Page 13: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Scientific Strategies

Method Oriented Approach:devise or import a tool, a procedure or a formalism,apply it to a task and develop it further. Then(optionally) see whether it works for additional tasks

Task oriented Approach:select a task; devise or import a method or severalmethods for its solution; integrate the methods asrequired to improve performance.

NLP Intro – WS 2005/6 – p.13

Page 14: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

requires all steps relevant to linguistic analysis of inputsentences and linguistic generation of output sentences

hence, machine translation is scientifically one of themost challenging and most comprehensive tasks incomputational linguistics

NLP Intro – WS 2005/6 – p.14

Page 15: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

requires all steps relevant to linguistic analysis of inputsentences and linguistic generation of output sentences

hence, machine translation is scientifically one of themost challenging and most comprehensive tasks incomputational linguistics

NLP Intro – WS 2005/6 – p.14

Page 16: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Machine Translation

What makes Machine Translation an important applicationarea to study:

historically first application area, and for at least adecade the only application area, of computationallinguistics

requires all steps relevant to linguistic analysis of inputsentences and linguistic generation of output sentences

hence, machine translation is scientifically one of themost challenging and most comprehensive tasks incomputational linguistics

NLP Intro – WS 2005/6 – p.14

Page 17: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Information Dissemination:e.g. Translation of technical manuals, legal texts,weather reports, etc.

Literary Translation:e.g. Translation of novels, poems, etc.

NLP Intro – WS 2005/6 – p.15

Page 18: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Information Dissemination:e.g. Translation of technical manuals, legal texts,weather reports, etc.

Literary Translation:e.g. Translation of novels, poems, etc.

NLP Intro – WS 2005/6 – p.15

Page 19: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

The Purposes of Translation

Information Acquisition:e.g. Gather information on scientific articles ornewspapers written in a foreign language.

Information Dissemination:e.g. Translation of technical manuals, legal texts,weather reports, etc.

Literary Translation:e.g. Translation of novels, poems, etc.

NLP Intro – WS 2005/6 – p.15

Page 20: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguage

typically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexicalow-quality translation is tolerable

NLP Intro – WS 2005/6 – p.16

Page 21: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source language

pre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexicalow-quality translation is tolerable

NLP Intro – WS 2005/6 – p.16

Page 22: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguage

may require special-purpose lexicalow-quality translation is tolerable

NLP Intro – WS 2005/6 – p.16

Page 23: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexica

low-quality translation is tolerable

NLP Intro – WS 2005/6 – p.16

Page 24: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT

Information Acquisition:involves translation from a foreign to a nativelanguagetypically used by non-linguists with little or nolinguistic competence in the source languagepre-processing of the input not feasible due to lack oflinguistic competence by the user in the sourcelanguagemay require special-purpose lexicalow-quality translation is tolerable

NLP Intro – WS 2005/6 – p.16

Page 25: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguage

pre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manualspurely human translation for such tasks can betime-consuming, inconsistent, or tedious.

NLP Intro – WS 2005/6 – p.17

Page 26: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguage

may involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manualspurely human translation for such tasks can betime-consuming, inconsistent, or tedious.

NLP Intro – WS 2005/6 – p.17

Page 27: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reports

often involves special terminologies stored in aterminology database; e.g. for translation oftechnical manualspurely human translation for such tasks can betime-consuming, inconsistent, or tedious.

NLP Intro – WS 2005/6 – p.17

Page 28: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manuals

purely human translation for such tasks can betime-consuming, inconsistent, or tedious.

NLP Intro – WS 2005/6 – p.17

Page 29: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(2)

Information Dissemination:involves translation from a native to a foreignlanguagepre- and post-processing of the input feasible due tolinguistic competence by the translator in the sourcelanguagemay involve sublanguage with restricted vocabulary;e.g. translation of weather reportsoften involves special terminologies stored in aterminology database; e.g. for translation oftechnical manualspurely human translation for such tasks can betime-consuming, inconsistent, or tedious.

NLP Intro – WS 2005/6 – p.17

Page 30: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic language

abundance of highly-trained human translatorstask rarely performed by machine translation

NLP Intro – WS 2005/6 – p.18

Page 31: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic languageabundance of highly-trained human translators

task rarely performed by machine translation

NLP Intro – WS 2005/6 – p.18

Page 32: Introduction to Computational Linguisticsenglish-linguistics.de/fr/teaching/ws05-06/icl/slides/lecture2.pdf · Computational linguistics is the study of computer systems for understanding

Relating Translation Purposes to MT(3)

Literary Translationrequires stylistic elegance, often involvesmetaphorical and metonymic languageabundance of highly-trained human translatorstask rarely performed by machine translation

NLP Intro – WS 2005/6 – p.18