templates in linguistics - why garbage garbage

20
Templates in Linguistics: Why Garbage Garbage? Presented by: Hussein Ghaly

Upload: hussein-ghaly

Post on 21-Jan-2017

44 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Templates in linguistics - Why Garbage Garbage

Templates in Linguistics: Why Garbage Garbage?

Presented by:Hussein Ghaly

Page 2: Templates in linguistics - Why Garbage Garbage

1- Garbage Disposal

Yaseen Ghaly, 3 Years Old:• Papy laih zebala zebala? (Dad, why

garbage garbage?) > Why are you carrying two garbage bags?

• Papy laih bang bang? (Dad, why bang bang?) > Why are you making this “bang bang/hammering” sound?

Page 3: Templates in linguistics - Why Garbage Garbage

2- Making a Template

• Yaseen seems to be using this template:• “Papy, laih X?” (Dad, why X)• Where X can be anything:

– Garbage Garbage– Bang Bang– Sleeping– …– 天空是绿色的 (foreign word/code switching)

Page 4: Templates in linguistics - Why Garbage Garbage

3- To build a language

• How can the linguistic expression from simple sentences into language such as ours?

• Answer: Recursion

Page 5: Templates in linguistics - Why Garbage Garbage

4- Recursively

• An example of recursion found by Salma Ghaly, 6 Years old.

Page 6: Templates in linguistics - Why Garbage Garbage

Main Claim

• Language is built using simple (idiomatic) templates. The complexity comes from recursion.

Page 7: Templates in linguistics - Why Garbage Garbage

Outline

• Starting Assumptions• Learning templates (Language Acquisition)• Cross Linguistic Template Linearity• Selecting A template (Semantic-Pragmatic

Prompt)• Extending A template (Template Malleability)• Applications of Templates (Information

Extraction and Machine Translation)

Page 8: Templates in linguistics - Why Garbage Garbage

Starting Assumptions - Syntax

• In the syntax literature, language is a lexicon of words, and a computational system to put these words where they should form a grammatical sentence.

Lexicon Computation System

Page 9: Templates in linguistics - Why Garbage Garbage

Starting Assumptions – Templates Framework

• The “lexicon”, which is stored in the memory, is extended with a list of templates, also stored in memory.

• The computational system only manages what to fill the placeholders within templates.

Word Lexicon Computation System

Template Lexicon

Page 10: Templates in linguistics - Why Garbage Garbage

Learning Templates

• The “garbage gabage” example indicates:– A child can intuitively form a template for

plurals (that is applicable in some human languages such as Bhasa Malaysia (e.g. kanak kanak=children)

– A child can put anything in the placeholder X within the sentence template “Dad, Why X?”

• But these hypotheses would need further evidence from First Language Acquistion

Page 11: Templates in linguistics - Why Garbage Garbage

Template Linearity

• English– I love you.– I miss you.– I need you.

• French– Je t’aime.– Tu me manques.– J’ai besoin de toi.

Clearly, the linear order is very different between Constructions in different languages.

This should entice us to think about how these constructions are generated.

Page 12: Templates in linguistics - Why Garbage Garbage

Semantic-Pragmatic Prompt• An area of overlap between the reason, context,

and information content of some sentence.• Start with list of arguments (X1: I, X2: You)• I Want to express [+feeling] [+positive]

[+distance], therefore: – in English, we invoke the template I miss X2.– In French, we invoke the template X2 me manques

(with some adjustments depending on pronouns, etc)• So I can utter the sentence after filling the

template:– I miss Randa.– Randa me manque.

Page 13: Templates in linguistics - Why Garbage Garbage

Template Variability• Almost everything can be said in an alternative

way:– Godzilla destroyed the City, which is unfortunate.– It is unfortunate that Godzilla destroyed the city.– The destruction of the city by Godzilla is unfortunate.

• So, there are different templates to express the relation between these four entities (being unfortunate, the destruction, Godzilla, the City). This again feeds into the argument of non-linearity of templates, this time within the same language.

Page 14: Templates in linguistics - Why Garbage Garbage

Template Malleability• Meaning how easy the template can be re-

shaped. This includes the following:– Tense malleability:

• John was eating fish.• John has been eating fish.

– Synonym malleability:• Sarah cannot tolerate this any more.• Sarah cannot put up with this anymore.

• The idea of malleability enables us to avoid accounting for hundreds of millions of combinations of basic templates.

Page 15: Templates in linguistics - Why Garbage Garbage

Using Templates

• For information Extraction (e.g. Banko and Etzioni 2008), where templates where used to extract (is-a) relationships between entities.

Page 16: Templates in linguistics - Why Garbage Garbage

Using Templates in Machine Translation

• Was first suggested by (Nagao, 1984) under the name of Example-Based Machine Translation. He also indicted this approach is relevant to Second Language Acquisition.

Page 17: Templates in linguistics - Why Garbage Garbage

Using Templates in Machine Translation

• Current state of the art Phrase-Based Statistical Machine Translation techniques uses contiguous chunks.

(Koehn, 2010)

Page 18: Templates in linguistics - Why Garbage Garbage

Using Templates in Machine Translation

• But using contiguous chunks misses many phrases where there is a difference in word order between the two languages.

- needs a lot of training data • To compensate for this, a statistical

reordering model is used - can make the output unintelligible

Page 19: Templates in linguistics - Why Garbage Garbage

Using Templates in Machine Translation

Chunk: Michael assumes that he will stay in the house ->Michael geht davon aus, dass er im haus bleibt

Subchunks:Michael -> Michaelin the house -> im haus

So by removing (stenciling) subcunks from the chunk we get a translation template

X1 assumes that he will stay X2 ->X1 geht davon aus, dass er X2 bleibt

- preserves word order - can apply to many sentences not seen before - requires less training data - can set restrictions on the type of placeholders (X1: NP , X2: PP)

Page 20: Templates in linguistics - Why Garbage Garbage

•Thank X1!

(X1 = You )