ling 388: language and computers sandiway fong lecture 22: 11/10

28
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Post on 20-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

LING 388: Language and Computers

Sandiway Fong

Lecture 22: 11/10

Page 2: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Last Time

• Homework 4: Translator– today: some time-saving hints for using the

debugger

Page 3: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Prolog Debugger Hints

• sometimes, tracing step-by-step is both tedious and unnecessary

• you can make use of other tracing commands• see the HELP system• example

– ?- debug. • puts the Prolog interpreter into debug (but not step-by-step tracing)

mode– ?- spy predicate/n.– e.g. ?- spy sbar/3.

• tells Prolog (in debug mode) to stop at sbar/3

– debug command l (Leap)• tells Prolog to continue (non-step-by-step) execution

Page 4: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Prolog Debugger Hints

• example• | ?- debug.• | ?- spy mapPA/2.• | ?- translate(X,[taroo,ga,hon,o,katta]).

silently runs sbar/3 generating English predicate-argument structure• + 19 2 Call: mapPA(buy(who,who),_689) ? l• ?+ 19 2 Exit: mapPA(buy(who,who),katta(dare,dare)) ? l• + 19 2 Redo: mapPA(buy(who,who),katta(dare,dare)) ? l• + 19 2 Fail: mapPA(buy(who,who),_689) ? l• + 62 2 Call: mapPA(buy(what,who),_689) ? l• ?+ 62 2 Exit: mapPA(buy(what,who),katta(nani,dare)) ? l• + 62 2 Redo: mapPA(buy(what,who),katta(nani,dare)) ? l• + 62 2 Fail: mapPA(buy(what,who),_689) ? l• + 106 2 Call: mapPA(buy(john,who),_689) ? l• ?+ 106 2 Exit: mapPA(buy(john,who),katta(taroo,dare)) ? l• + 106 2 Redo: mapPA(buy(john,who),katta(taroo,dare)) ? l• + 106 2 Fail: mapPA(buy(john,who),_689) ? l• + 152 2 Call: mapPA(buy(bucket,who),_689) ? l• + 152 2 Fail: mapPA(buy(bucket,who),_689) ?

Page 5: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Moving on to Other Topics

• Natural Language Processing is a very broad field

– the next area we’ll be exploring will be word semantics

– but first a brief look at the inherent complexity of human language

Page 6: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Computational Implementation

• good news! practical knowledge– you now have practical knowledge on how to write grammars to

account for various syntactic phenomena• e.g. we’ve covered...

– passivization, progressive aspect, determiner-noun/subject-verb agreement, wh-questions, yes/no-questions, different word orders etc.

• at this point in the course– you have all the basic tools and grammar programming techniques

for implementing almost anything in the sense– you could (in principle) implement all the constructions people can

document

• questions– what would that grammar look like?– is it a good model?

Page 7: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• the bad news• language is a complex system

– in terms of shades of meaning– in terms of the syntax– in terms of what is allowed and what is not

• language is part of a generative system– you can compose constructions and create new sentences– people can have razor-sharp judgments about data they

have never encountered before– not just in terms of grammaticality/ungrammaticality– but also in terms of semantic interpretation

Page 8: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• compositionality of constructions– active: the militia arrested John– passive: John was arrested– simple: John is sad– raising: John seems to be sad– raising+passive:

• John seems to have been arrested– *passive+raising:

• *John was seemed to be arrested

Page 9: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• what’s allowed and what’s not– subject relative clause:

• the man that knows me (is not a liar)

– object relative clause: • the man that I know (is not a liar)

• omission of the relative pronoun (that)– subject relative clause:

• *the man knows me (is not a liar)

– object relative clause: • the man I know (is not a liar)

• why?– is it an arbitrary rule? Who came up with it? Why do we all agree?

Page 10: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• (The King’s English: Fowler 1908)– the omission of the relative in isolated clauses (as opposed

to coordinates) is a question not of correctness but of taste, so far as there is any question at all. [...]

– the omission of a defining relative subject is often effective in verse, but in prose is either an archaism or a provincialism. It may, moreover, result in obscurity ...

• Now it would be some fresh insect won its way to a temporary fatal new development.

• (H. G. Wells)

– but when the defining relative is object, or has a preposition, there is no limit to the omission ...

Page 11: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• 2nd language learners of English worry about these rules a lot

– this is the student did it– ‘zero’-subject relatives common in Hong Kong English (Gisborne 2000)

– this is the student who did it

Page 12: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The Puzzle of Language

• for semantics– we’re not just talking about (famous) sentences

like• colorless green ideas sleep furiously

(Chomsky 1957)

– but also many sentences for which we take the rules of interpretation for granted

• perhaps suggests that we’re operating with rules or principles which we’re not conscious or aware of

Page 13: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

• consider the object wh-question– Which report did you file without reading?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

1

Page 14: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

• the object wh-question– Which report did you file without reading?

• is actually a pretty complicated sentence for any computer grammar writer to deal with

• let’s look at one problem necessary for interpretation• gap-filling

– file is a verb, there is a filer and something being filed• file(filer,something)

– the thing being filed is the report in question• file(filer,[which report])

Page 15: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

• consider the object wh-question– Which report did you file without reading?

• also– read is a verb, there is a reader and something being read

• read(reader,something)

• also implicit– the reader must be the same person referred to by the pronoun you

• file(filer[you],[which report])• & read(reader,something)• & filer=reader

– the thing being read must be the same thing being filed, which must be the report in question• file(filer[you],[which report])• & read(reader,something)• & filer=reader• & something = [which report]

• there are no other possible interpretations – in this case

Page 16: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

• consider the object wh-question– Which report did you file without reading?

• there are no other possible interpretations– meaning (for example) that

• we cannot be asking about some report that you filed but someone else read

Page 17: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

– Which report did you file without reading?

• so the only interpretation is– Which report did you file [the report] without [you] reading [the

report]?

• can be viewed as a form of “compression”– Which report did you file [the report] without [you] reading [the

report]?

• in other words– there is an understanding between speaker and hearer that the

hearer can decode and recover the missing bits because they share the same “grammar”

• recall– the grammar with traces in earlier lectures...

Page 18: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

– Which report did you file without reading?

• only interpretation is– Which report did you file [the report] without [you] reading

[the report]?

• a computer program has to know the rules of gap filling – (for this so-called parasitic gap sentence)

• questions– what are the rules of gap filling?– were you taught these rules in school?– say you wanted to implement them, can you find them in a

grammar book?

Page 19: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

An Example

• rules of gap filling must account for– Which report did you file without reading?

– *Which book did you file the report without reading

– imagine: you file the book without reading which book

– *The report was filed without reading

– *The report was filed after Bill read

– compare with: The report was filed without being read

– These papers are easy to file without reading

– This book is not worth reading without attempting to analyze deeply

• given partial data– can you be confident of coming up with the right rules or

mechanism?

Page 20: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The “Rules”

• what do the rules look like?– surely not Prolog DCG

• are we sure we covered all the cases?• how about

– *Who left without insulting?– who left without insulting John?

• debate– How come “everyone” acquired the same rule systems?– Are these rules innate knowledge or learnt?

Page 21: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The “Rules”

• How is the knowledge of language acquired?• From (Chomsky 1986)• Standard belief 30+ years ago

– language acquisition is a case of “overlearning”– language is a habit system assumed to be overdetermined by

available evidence

• Plato’s Problem– the problem of “poverty of stimulus”– accounting for the richness, complexity and specificity of shared

knowledge given the limitations of the data available– poverty of evidence

Page 22: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The “Rules”

• idea then that– we’re pre-wired to learn language– data like the sentences we’ve been looking at are (in part)

determined by the architecture and machinery of the language faculty

– we’re not acquiring these rules from scratch– the pre-wiring is part of our genetic endowment– reasonable to assume what is pre-wired must be universal– if so, the pre-wiring must be flexible enough to account for

language variation

– yet reduce the learning burden

Page 23: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

The “Rules”

Minimalist Program (MP)• current linguistic technology (research area)• language is a computational system• even fewer mechanisms

Principles-and-Parameters Framework (GB)• reduction of construction rules to• fundamental principles (the atoms of theory)• explanatory adequacy

Rule-based systems• construction-based• monostratal, e.g. context-free grammars• multiple levels. e.g. transformational grammars

Page 24: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

• example– colorless green ideas sleep furiously

• First hit

Interesting things to GoogleQuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 25: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Interesting things to Google

• example– colorless green ideas sleep furiously

• first hit– compositional semantics– a green idea is, according to well established usage of the word "green" is

one that is an idea that is new and untried. – again, a colorless idea is one without vividness, dull and unexciting. – so it follows that a colorless green idea is a new, untried idea that is

without vividness, dull and unexciting. – to sleep is, among other things, is to be in a state of dormancy or inactivity,

or in a state of unconsciousness. – to sleep furiously may seem a puzzling turn of phrase but one reflects that

the mind in sleep often indeed moves furiously with ideas and images flickering in and out.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 26: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Interesting things to Google

• example– colorless green ideas sleep furiously

• another hit: (a story)– "So this is our ranking system," said Chomsky. "As you can see,

the highest rank is yellow." – "And the new ideas?" – "The green ones? Oh, the green ones don't get a color until

they've had some seasoning. These ones, anyway, are still too angry. Even when they're asleep, they're furious. We've had to kick them out of the dormitories - they're just unmanageable."

– "So where are they?" – "Look," said Chomsky, and pointed out of the window. There below,

on the lawn, the colorless green ideas slept, furiously.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 27: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Interesting things to Google

• examples– (1) colorless green ideas sleep furiously– (2) furiously sleep ideas green colorless

• Chomsky (1957):– . . . It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally `remote' from English. Yet (1), though nonsensical, is grammatical, while (2) is not.

• idea– (1) is syntactically valid, (2) is word salad

• Statistical Experiment (Pereira 2002)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 28: LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10

Interesting things to Google

• examples– (1) colorless green ideas sleep furiously– (2) furiously sleep ideas green colorless

• Statistical Experiment (Pereira 2002)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

wiwi-1

bigram language model