introduction to computational linguistics

Post on 05-Feb-2016

248 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to Computational Linguistics. Misty Azara. Agenda. Introduction to Computational Linguistics (CL) Common CL applications Using CL in theoretical linguistics (computational modeling). What is Computational Linguistics?. CL is interdisciplinary Linguistics Computer Science - PowerPoint PPT Presentation

TRANSCRIPT

Introduction to Computational Linguistics

Misty Azara

Agenda

Introduction to Computational Linguistics (CL)

Common CL applications Using CL in theoretical linguistics

(computational modeling)

What is Computational Linguistics?

CL is interdisciplinary Linguistics Computer Science Mathematics Electrical Engineering Psychology Speech and Hearing Science

What is Computational Linguistics?

Computational Linguistics covers many areas

Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting

Core Areas of CL Machine Translation Speech Recognition Text-to-Speech Natural Language Generation Human-Computer Dialogs Information Retrieval Computational Modeling…

Machine Translation

Using computers to automate some or all of translating from

one language to another

Three general models or tasks: Tasks for which a rough translation is

adequate Tasks where a human post-editor can

be used to improve the output Tasks limited to a small sublanguage

Machine Translation (cont.)

Linguistic knowledge is extremely useful in this area of CL

MT benefits from knowledge of language typology and language-specific linguistic information

Speech Recognition

Taking spoken language as input and outputting the

corresponding text

Architecture

SR takes the source speech and produces “guesses” as to which words could correspond to the source via some type of acoustic model

The word with the highest probability is selected as the optimal candidate

Why use SR?

Allow for hands-free human-computer interaction

Text-to-Speech

Taking text as input and outputting the corresponding

spoken language

Three types of TTS

Articulatory- models the physiological characteristics of the vocal tract

Concatenative- uses pre-recorded segments to construct the utterance(s)

Three types of TTS (cont.)

Parametric/Formant- models the formant transitions of speech

[baj]

Why is TTS so difficult?

Spelling through, rough

Homonyms PERmit (n) vs. perMIT (v)

Prosody Pitch, duration of segments, phrasing of

segments, intonational tune, emotion“I am so angry at you. I have never been more enraged in my

life!!”

Why use TTS?

Allows for text to be read automatically

Extremely useful for the visually impaired

Natural Language Generation

Constructing linguistic outputs from non-linguistic

inputs

Natural Language Generation Maps meaning to text Nature of the input varies greatly

from one application to another (i.e documenting structure of a computer program)

The job of the NLG system is to extract the necessary information to drive the generation process

NLG systems have to make choices:

Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal

Lexical selection- the system must choose the lexical item most appropriate for expressing a concept

Sentence Structure Aggregation- the system must

apportion the content into phrase, clause, and sentence-sized chunks

Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task)

Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure

Sample NLG output

To save a file1. Choose save from the file menu2. Choose the appropriate folder3. Type the file name4. Click the save button

The system will save the document.…

Human-Computer Dialogs

Uses a mix of SR, TTS, and pre-recorded prompts to

achieve some goal

Human-Computer Dialogs

Uses speech recognition, or a combination of SR and touch tone as input to the system

The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts

Dialog systems have specific tasks, which limit the domain of conversation

This makes the SR problem much easier, as the potential responses become very constrained

Sample dialog system for banking

…Sys: would you like information for

checking or savings? User: Checking, please.Sys: Your current balance is $2,568.92.

Would you like another transaction?User: Yes, has check #2431 cleared?…

Linguistic knowledge in dialog systems

Discourse structure- ensuring natural flowing discourse interaction

Building appropriate vocabularies/lexicons for the tasks

Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)

Why use human-computer systems?

Automate simple tasks- no need for a teller to be on the other end of the line!

Allow access to system information from anywhere, via the telephone

Information Retrieval

Storage, analysis, and retrieval of text documents

Information Retrieval

Most current IR systems are based on some interpretation of compositional semantics

IR is the core of web-based searching, i.e. Google, Altavista, etc.

Information Retrieval Architecture

User inputs a word or string of words

System processes the words and retrieves documents corresponding to the request

“Bag of Words”

The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only

Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!

Linguistic Knowledge in IR

Semantics Compositional Lexical

Syntax (depending on the model used)

Computational Modeling

Computational approaches to problem solving, modeling,

and development of theories

How can we use computational modeling? Test our theories of language

change~ synchronic or diachronic Develop working models of

language evolution Model speech perception,

production, and processing Almost any theoretical model can

have a computational counterpart

Why Use Computational Modeling?

Forces explicitness – no black boxes or behind the scenes “magic”

Allows for modeling that would otherwise be impossible

Allows for modeling that would otherwise be unethical

Conclusions

CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics

Computational modeling can aid linguists’ theories of language processing and structure

top related