introduction to syntax, with part-of-speech tagging

26
Introduction to Syntax, with Part-of- Speech Tagging Owen Rambow September 17 & 19

Upload: deliz

Post on 14-Jan-2016

58 views

Category:

Documents


4 download

DESCRIPTION

Introduction to Syntax, with Part-of-Speech Tagging. Owen Rambow September 17 & 19. Admin Stuff. These slides available at http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Syntax, with Part-of-Speech Tagging

Introduction to Syntax, with Part-of-Speech

Tagging

Owen RambowSeptember 17 & 19

Page 2: Introduction to Syntax, with Part-of-Speech Tagging

Admin Stuff

• These slides available at o http://www.cs.columbia.edu/~rambow/teaching

.html

• For Eliza in homework, you can use a tagger or chunker, if you want – details at:o http://www.cs.columbia.edu/~ani/cs4705.html

• Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721

Page 3: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging

• Want to choose most likely string of tags (T), given the string of words (W)

• W = w1, w2, …, wn

• T = t1, t2, …, tn

• I.e., want argmaxT p(T | W)• Problem: sparse data

Page 4: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging (ctd)

• p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W)

• argmaxT p(T|W)

= argmaxT p(W|T) p (T) / p(W)

= argmaxT p(W|T) p (T)

Page 5: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging (ctd)

p(T) = p(t1, t2, …, tn-1 , tn)

= p(tn | t1, …, tn-1 ) p (t1, …, tn-1)

= p(tn | t1, …, tn-1 )

p(tn-1 | t1, …, tn-2) p (t1, …, tn-2)

= i p(ti | t1, …, ti-1 ) i p(ti | ti-2, ti-1 ) trigram (n-gram)

Page 6: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging (ctd)

p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn )

= i p(wi | w1, …, wi-1, t1, t2, …, tn)

i p(wi | ti )

Page 7: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging (ctd)

argmaxT p(T|W) = argmaxT p(W|T) p (T)

argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 )

• Relatively easy to get data for parameter estimation (next slide)

• But: need smoothing for unseen words• Easy to determine the argmax (Viterbi

algorithm in time linear in sentence length)

Page 8: Introduction to Syntax, with Part-of-Speech Tagging

Probability Estimation for trigram POS Tagging

Maximum-Likelihood Estimation• p’ ( wi | ti ) = c( wi, ti ) / c( ti )

• p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-

1 )

Page 9: Introduction to Syntax, with Part-of-Speech Tagging

Statistical POS Tagging

• Method common to many tasks in speech & NLP

• “Noisy Channel Model”, Hidden Markov Model

Page 10: Introduction to Syntax, with Part-of-Speech Tagging

Back to Syntax

• (((the/Det) boy/N) likes/V ((a/Det) girl/N))

boy

the

likes

girl

a

DetP

NP NP

DetP

S

Phrase-structuretree

nonterminalsymbols= constituents

terminal symbols = words

Page 11: Introduction to Syntax, with Part-of-Speech Tagging

Phrase Structure and Dependency Structure

likes/V

boy/N girl/N

the/Det a/Detboy

the

likes

girl

a

DetP

NP NP

DetP

S

Page 12: Introduction to Syntax, with Part-of-Speech Tagging

Types of Dependency

likes/V

boy/N girl/N

a/Detsmall/Adjthe/Det

very/Adv

sometimes/Adv

ObjSubjAdj(unct)

FwFw

Adj

Adj

Page 13: Introduction to Syntax, with Part-of-Speech Tagging

Grammatical Relations

• Types of relations between wordso Arguments: subject, object, indirect

object, prepositional objecto Adjuncts: temporal, locative, causal,

manner, …o Function Words

Page 14: Introduction to Syntax, with Part-of-Speech Tagging

Subcategorization

• List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc)

• In canonical order Subject-Object-IndObj

• Example:o like: N-N, N-V(to-inf)o see: N, N-N, N-N-V(inf)

• Note: J&M talk about subcategorization only within VP

Page 15: Introduction to Syntax, with Part-of-Speech Tagging

Where is the VP?

boy

the

likes

girl

a

DetP

NP NP

DetP

S

boy

the

likesDetP

NP

girl

a

NP

DetP

S

VP

Page 16: Introduction to Syntax, with Part-of-Speech Tagging

Where is the VP?

• Existence of VP is a linguistic (empirical) claim, not a methodological claim

• Semantic evidence???• Syntactic evidence

o VP-fronting (and quickly clean the carpet he did! )o VP-ellipsis (He cleaned the carpets quickly, and so

did she )o Can have adjuncts before and after VP, but not in

VP (He often eats beans, *he eats often beans )• Note: in all right-branching structures, issue

is different again

Page 17: Introduction to Syntax, with Part-of-Speech Tagging

Penn Treebank, Again

• Syntactically annotated corpus (phrase structure)

• PTB is not naturally occurring data!• Represents a particular linguistic theory

(but a fairly “vanilla” one)• Particularities

o Very indirect representation of grammatical relations (need for head percolation tables)

o Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat )

o Has flat Ss, flat VPs

Page 18: Introduction to Syntax, with Part-of-Speech Tagging

Context-Free Grammars

• Defined in formal language theory (comp sci)

• Terminals, nonterminals, start symbol, rules

• String-rewriting system• Start with start symbol, rewrite

using rules, done when only terminals left

Page 19: Introduction to Syntax, with Part-of-Speech Tagging

CFG: Example

• Ruleso S NP VPo VP V NPo NP Det N | AdjP NPo AdjP Adj | Adv AdjPo N boy | girlo V sees | likeso Adj big | smallo Adv very o Det a | the

the very small boy likes a girl

Page 20: Introduction to Syntax, with Part-of-Speech Tagging

Derivations of CFGs

• String rewriting system: we derive a string (=derived structure)

• But derivation history represented by phrase-structure tree (=derivation structure)!

Page 21: Introduction to Syntax, with Part-of-Speech Tagging

Grammar Equivalence and Normal Form

• Can have different grammars that generate same set of strings (weak equivalence)

• Can have different grammars that have same set of derivation trees (string equivalence)

Page 22: Introduction to Syntax, with Part-of-Speech Tagging

Nobody Uses CFGs Only (Except Intro NLP Courses)

o All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another

o All successful parsers currently use statistics about phrase structure and about dependency

Page 23: Introduction to Syntax, with Part-of-Speech Tagging

Massive Ambiguity of Syntax

• For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations!

• Example:o The large head master told the man

that he gave money and shares in a letter on Wednesday

Page 24: Introduction to Syntax, with Part-of-Speech Tagging

Some Syntactic Constructions: Wh -

Movement

Page 25: Introduction to Syntax, with Part-of-Speech Tagging

Control

Page 26: Introduction to Syntax, with Part-of-Speech Tagging

Raising