ling 388: language and computers sandiway fong lecture 1: 8/22

27
LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

LING 388: Language and Computers

Sandiway Fong

Lecture 1: 8/22

Page 2: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Where– Harvill 313

• When– TR 3:30-4:45PM

• No Class– Thursday September 14th– Thursday September 28th– Thursday November 23rd (Thanksgiving)

• Office Hours– catch me after class, or– by appointment– Location: Douglass 311

Page 3: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Map– Classroom

(Harvill)

– Office (Douglass)

– Lab– (SS 224)

Page 4: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Email– [email protected]

• Class mailing list– [email protected]

• Homepage– http://dingo.sbs.arizona.edu/~sandiway

• Lecture slides:– available on homepage after each class– in both PowerPoint (.ppt) and Adobe PDF formats

• .ppt slides may contain animation– slides from previous years are available online

• caution: there will be changes from last year

Page 5: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Tips on how to take this class– No required textbook

• save time

– Lecture slides contain everything you need to know in order to do the homeworks

• To understand the slides, • you need to attend classes to “grok” the concepts

– Unclear on something? • You are encouraged to ask questions in or after class• Ask while the question is still fresh in your mind

– Have an idea, want to go over some of the material again, or have more in-depth questions?

• Make an appointment

Page 6: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Course Objectives– Theoretical

• Introduction to natural language processing techniques

– Practical• Be able to write a natural language grammar that runs on

a computer• Get an idea of what’s hard and what’s easy to do on a

computer

Goal: by the end of the course, you will have built a smallmachine translation engine

Page 7: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Class demographics: LING 388

LING

PSYCH

MATH

EAS

PRPH

ENGL

ENGR

EPH

FRENNMS

Page 8: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Laboratory Exercises– Some lectures will be laboratory sessions

• (typically Thursdays)

– We will do exercises on the computer in class– Homework questions will be handed out in these

sessions– Homework questions are designed to extend the

exercises done in the lab– You may do the homework exercises on your own

computer or at the computer laboratory

Page 9: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Grading– 6~7 homeworks – Mandatory and Extra Credit

Questions: • extra credit questions may be

applied to the current homework• they may also bump you up a grade

if you are borderline at the end of the semester

– Homeworks are due 1 week after they are handed out

– Homeworks must be submitted by email (by midnight)

– Example: • a homework given out on Thursday

will be due next Thursday at midnight

• Ethics– You may discuss the

homeworks with your classmates

– However, you must do the work and write them up independently

– Sources must be acknowledged (students, webpage)

– Cheaters will be sanctioned

Page 10: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Homework tips– Homeworks are based on lab exercises

• make sure you show up for the lab lectures

– Possible time-saving strategy: stay on after the lecture and do the homework questions right there

• exercises are fresh in your mind

• may even be possible to complete the homework in an hour right there …

– Nightmare strategy: wait until the evening homework is due, scratch your head over the lecture notes, have tons of questions and start panicking

• your computer crashes, the net goes down …

Page 11: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Late Policy– All homeworks are mandatory– deduction if handed in late– If you know you’re going to be late or have an upcoming

emergency, let me know ahead of time

Page 12: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• Homework Disaster Repair Policy– You “tank” on a homework

• do badly or way worse than you expected

• don’t panic

– Strategies• always attempt any extra credit questions

• get help and explanations from me– plus an extra question or two to demonstrate your understanding

– Philosophy• You are not penalized for learning or making an unfortunate

mistake

Page 13: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Administrivia

• There is a laptop being passed around

• Fill out Excel spreadsheet entries: – Name– Email– Year– Major– Relevant background

Page 14: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Natural Language Processing (NLP)Human Language Technology (HLT)

Computational Linguistics

• Question:– How to process natural languages on a computer

• Intersects with:– Computer science (CS)– Mathematics/Statistics – Artificial intelligence (AI)– Linguistic Theory– Psychology: Psycholinguistics

• e.g. the human sentence processor

Page 15: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Applications

• Information retrieval– information is stored and accessed using language (keywords etc.)– document classification (email, news)

• Machine translation– babelfish

• http://babelfish.altavista.com/

– Google

• Language Comprehension– document summarization

• Speech– automated 800 toll-free directory (800 555 1212)– cellphones (handsfree dialing)– car navigation (voice-synthesized directions)

Page 16: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Applications

– technology is still in development

• computers can’t really understand language (yet)– see babelfish or google webpage translation– well, it’s free!

• even if we are willing to pay...– machine translation has been worked on since after World

War II (1950s)– still not perfected today– why?– what are the properties of human languages that make it

hard?

Page 17: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Natural Language Properties

• Which ones are going to be difficult for computers to deal with?

• Grammar (Rules for putting words together into sentences)– How many rules are there?

• 100, 1000, 10000, more …

– Portions learnt or innate– Do we have all the rules written down somewhere?

• Lexicon (Dictionary)– How many words do we need to know?

• 1000, 10000, 100000 …

Page 18: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Computers vs. Humans

• Knowledge of language– Computers are way faster than humans

• They kill us at arithmetic and chess

– But human beings are so good at language, we often take our ability for granted

• Processed without conscious thought• Do pretty complex things

Page 19: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Examples

• Knowledge– Which report did you file without reading?– (Parasitic gap sentence)

Page 20: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Examples

• Changes in interpretation• John is too stubborn to talk to• John is too stubborn to talk to Bill

Page 21: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Examples

• Ambiguity– Where can I see the

bus stop?

– stop: verb or part of the noun-noun compound bus stop

– Context (Discourse or situation)

Page 22: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Examples

• Ungrammaticality– *Which book did you file the report without

reading?

– * = ungrammatical• relative

– ungrammatical vs. incomprehensible

Page 23: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Example

• The human parser has quirks• Ian told the man that he hired a story• Ian told the man that he hired a secretary

• Garden-pathing• Temporary ambiguity• tell: someone something vs. …

Page 24: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Examples

• More subtle differences• The reporter who the senator attacked admitted the error• The reporter who attacked the senator admitted the error

– Processing time– Subject vs. object relative clauses– Q: Do we want to mimic the human parser

completely?

Page 25: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Next time …

• this Thursday– Class meets in the SBS RI Lab (Social Sciences 224)

• We begin our gentle introduction (from scratch) to a logic-based computer language– Series of six lectures– Name: PROLOG– Variant: SWI-PROLOG (free software)– Download: http://www.swi-prolog.org/– Based on logic– “Natural” and easy to learn but powerful– Contains lots of nifty built-in features for writing grammars

• language was originally designed for this purpose

Page 26: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Your Homework for Today

• Install SWI-Prolog on your PC

Page 27: LING 388: Language and Computers Sandiway Fong Lecture 1: 8/22

Prolog Resources

• Some background in logic or programming?

• Useful Online Tutorials– An introduction to Prolog

• (Michel Loiseleur & Nicolas Vigier)• http://invaders.mars-attacks.org/~bokl

m/prolog/

– Learn Prolog Now! • (Patrick Blackburn, Johan Bos &

Kristina Striegnitz)• http://www.coli.uni-saarland.de/~kris/le

arn-prolog-now/lpnpage.php?pageid=online