part-of-speech tagging

50
Part-of-Speech Tagging Torbjörn Lager Department of Linguistics Stockholm University

Upload: jordan-grant

Post on 03-Jan-2016

55 views

Category:

Documents


3 download

DESCRIPTION

Part-of-Speech Tagging. Torbjörn Lager Department of Linguistics Stockholm University. Part-of-Speech Tagging: Definition. From Jurafsky & Martin 2000: Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part-of-Speech Tagging

Part-of-Speech Tagging

Torbjörn LagerDepartment of LinguisticsStockholm University

Page 2: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 2

Part-of-Speech Tagging: Definition

From Jurafsky & Martin 2000:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus.

The input to a tagging algorithm is a string of words and a specified tagset. The output is a single best tag for each word.

A bit too narrow for my taste...

Page 3: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 3

Part-of-Speech Tagging: Example 1

Input

He can can a can

Output

He/pron can/aux can/vb a/det can/n

Another possible output

He/{pron} can/{aux,n} can/{vb} a/{det} can/{n,vb}

Page 4: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 4

Tag Sets

The Penn Treebank tag set (see appendix in handout)

Page 5: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 5

Why Part-of-Speech Tagging?

A first step towards parsing

A first step towards word sense disambiguation

Provide clues to pronounciation "object" -> OBject or obJECT (but note: BAnan vs baNAN)

Research in Corpus Linguistics

Page 6: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 6

Part-of-Speech Tagging: Example 2

I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire.

Page 7: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 7

Relevant Information

Lexical informationLocal contextual information

Page 8: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 8

Part-of-Speech Tagging: Example 2

I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire.

I/PRP can/MD light/VB a/DT fire/NN and/CC you/PRP can/MD open/VB a/DT can/NN of/IN beans/NNS ./. Now/RB the/DT can/NN is/VBZ open/JJ and/CC we/PRP can/MD eat/VB in/IN the/DT light/NN of/IN the/DT fire/NN ./.

Page 9: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 9

Part-of-Speech Tagging

Processor

Knowledge

Text POS tagged text

Needed:- Some strategy for representing the knowledge- Some method for acquiring the knowledge- Some method of applying the knowledge

Page 10: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 10

Approaches to PoS Tagging

The bold approach: 'Use all the information you have and guess"'

The whimsical approach: 'Guess first, then change your mind if nessessary!'

The cautious approach: 'Don't guess, just eliminate the impossible!'

Page 11: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 11

Some POS-Tagging Issues

AccuracySpeedSpace requirementsLearningIntelligibility

Processor

Knowledge

Text POS tagged text

Page 12: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 12

Cutting the Cake

Tagging methods Rule based Statistical Mixed Other methods

Learning methods Supervised learning Unsupervised learning

Page 13: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 13

HMM Tagging

The bold approach: 'Use all the information you have and guess"'

Statistical methodSupervised (or unsupervised) learning

Page 14: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 14

Page 15: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 15

The Naive Approach and its Problem

Traverse all the paths compatible with the input and then pick the most probable one

Problem: There are 27 paths in the HMM for S="he can

can a can" Doubling the length of S (with a conjunction in

between) -> 729 paths Doubling S again -> 531431 paths! Exponential time complexity!

Page 16: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 16

Solution

Use the Viterbi algorithm

Tagging can be done in time proportional to the length of input.

How and Why does the Viterbi algorithm work? We save this for later...

Page 17: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 17

Training an HMM

Estimate probabilities from relative frequencies. Output probabilities P(w|t): the number of occurrences of w

tagged as t, divided by the number of occurrences of t. Transitional probabilities P(t1|t2): the number of occurrences

of t1 followed by t2, divided by the number of occurrences of t2.

Use smoothing to overcome the sparse data problem (unknown words, uncommon words, uncommon contexts)

Page 18: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 18

Transformation-Based Learning

The whimsical approach: 'Guess first, then change your mind if nessessary!'

Rule based tagging, statistical learningSupervised learningMethod due to Eric Brill (1995)

Page 19: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 19

A Small PoS Tagging Example

rules

tag:NN>VB <- tag:TO@[-1] otag:VB>NN <- tag:DT@[-1] o....

inputShe decided to table her data

lexicondata:NN

decided:VB

her:PN

she:PN

table:NN VB

to:TO

NP VB TO NN PN NN

Page 20: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 20

Lexicon for Brill Tagging

I PRPNow RBa DTand CCbeans NNScan MDeat VBfire NN VB

in INis VBZlight NN JJ VBof INopen JJ VBthe DTwe PRPyou PRP. .

Page 21: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 21

A Rule Sequence

tag:'NN'>'VB' <- tag:'TO'@[-1] otag:'VBP'>'VB' <- tag:'MD'@[-1,-2,-3] otag:'NN'>'VB' <- tag:'MD'@[-1,-2] otag:'VB'>'NN' <- tag:'DT'@[-1,-2] otag:'VBD'>'VBN' <- tag:'VBZ'@[-1,-2,-3] otag:'VBN'>'VBD' <- tag:'PRP'@[-1] otag:'POS'>'VBZ' <- tag:'PRP'@[-1] otag:'VB'>'VBP' <- tag:'NNS'@[-1] otag:'IN'>'RB' <- wd:as@[0] & wd:as@[2] otag:'IN'>'WDT' <- tag:'VB'@[1,2] otag:'VB'>'VBP' <- tag:'PRP'@[-1] otag:'IN'>'WDT' <- tag:'VBZ'@[1] o...

Page 22: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 22

Transformation-Based Painting

blue

green

red

brown

brown

yellow

blue blue

blue

red

K. Samuel 1998

Page 23: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 23

Transformation-Based Learning

Learner

Tagged Corpus

Rules

I nitial Corpus

Templates

Hand Coded Corpus

Page 24: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 24

Transformation-Based Learning

see appendix in handout

Page 25: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 25

Constraint-Grammar Tagging

Due to Fred Karlsson et al.The cautious approach: 'Don't guess, just

eliminate the impossible!'Rule basedNo learning ('learning by injection')

Page 26: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 26

Constraint Grammar Example

I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire.

I/{PRP} can/{MD,NN} light/{JJ,NN,VB} a/{DT} fire/{NN} and/{CC} you/{PRP} can/{MD,NN} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ,VB} and/{CC} we/{PRP} can/{MD,NN} eat/{VB} in/{IN} the/{DT} light/{JJ,NN,VB} of/{IN} the/{DT} fire/{NN} ./{.}

Page 27: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 27

Constraint Grammar Example

tag:red 'RP' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'VB' <- tag:'DT'@[-1] otag:red 'NP' <- wd:'The'@[0] otag:red 'VBN' <- wd:said@[0] otag:red 'VBP' <- tag:'TO'@[-1,-2] otag:red 'VBP' <- tag:'MD'@[-1,-2,-3] otag:red 'VBZ' <- wd:'\'s'@[0] & tag:'NN'@[1] otag:red 'RP' <- wd:in@[0] & tag:'NNS'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NNS'@[-1] o...

Page 28: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 28

Constraint Grammar Example

I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire.

I/{PP} can/{MD} light/{JJ,VB} a/{DT} fire/{NN} and/{CC} you/{PP} can/{MD} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ} and/{CC} we/{PP} can/{MD} eat/{VB} in/{IN} the/{DT} light/{NN} of/{IN} the/{DT} fire/{NN} ./{.}

Page 29: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 29

Evaluation

Two reasons for evaluating: Compare with other peoples methods/systems Compare with earlier versions of your own system

Accuracy (recall and precision)

Baseline

Ceiling

N-fold cross-validation methodology => Good use of the data + More statistically reliable results.

Page 30: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 30

Assessing the Taggers

Accuracy

Speed

Space requirements

Learning

Intelligibility

Page 31: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 31

Demo Taggers

Transformation-Based Tagger: www.ling.gu.se/~lager/Home/brilltagger_ui.html

Constraint-Grammar Tagger www.ling.gu.se/~lager/Home/cgtagger_ui.html

Featuring tracing facilities!Try it yourself!

Page 32: Part-of-Speech Tagging

Parsing

Torbjörn LagerDepartment of LinguisticsStockholm University

Page 33: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 33

Parsing

Parsing with a phrase structure grammarShallow parsing

Page 34: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 34

A Simple Phrase Structure Grammar

Fragment

lisa springer lisa skjuter en älg

Grammar

s --> np, vp.

np --> pn.np --> det, n.

vp --> v.vp --> v, np.

pn --> [kalle].pn --> [lisa].

det --> [en].

n --> [älg].

v --> [springer].v --> [skjuter].

Page 35: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 35

Recognition and Parsing

Recognition

?- s([lisa,springer],[]).yes?- s([springer,lisa],[]).no

Parsing

?- s(Tree,[lisa,springer],[]).Tree = s(np(pn(lisa)),vp(v(springer)))

Page 36: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 36

A Top-Down Parser in Prolog

parse(A,P0,P,A/Trees) :-(A --> B),parse(B,P0,P,Trees).

parse((B,Bs),P0,P,(Tree,Trees)) :- parse(B,P0,P1,Tree),parse(Bs,P1,P,Trees).

parse([Word],[Word|P],P,Word).

Page 37: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 37

Trying It Out

s --> np, vp. det --> [en].np --> pn. n --> [älg].np --> det, n. tv --> [skjuter].vp --> v, np. pn --> [lisa].

? - parse(s,[lisa,skjuter,en,älg],[],Tree). Tree = s/(np/pn/lisa,vp/(v/skjuter,np/(det/en,n/älg)))

Page 38: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 38

The Resulting Tree

Tree = s/ np/ pn/lisa, vp/ v/skjuter, np/ det/en, n/älg

Page 39: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 39

Syntactic Ambiguity

Den gamla damen träffade killen med handväskan

John saw a man in the park with a telescope

Råttan åt upp osten och hunden och katten jagade råttan

Page 40: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 40

Local Ambiguity

The old man the boats

The horse raced past the barn fell

Page 41: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 41

Indeterminism and Search

A depth-first, top-down, left-to-right, backtracking parser can handle (both forms) of ambiguity.

Parsing as a form of search

Page 42: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 42

A Problem

Left-recursive rules

np --> np, pp.np --> np, conj, np.

Indirect left-recursion

A --> B, C.B --> A, D.

Page 43: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 43

Another Problem

s --> np, vp.vp --> v, np.vp --> v, np, pp.vp --> v, np, vp....

Ex: John saw the man talk with the actress

Parsing is exponential in the worst case!

Page 44: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 44

Solution

Use a table (chart) in which parsed constituents are stored. No constituent is added to the chart which is already in it.

Parsing can be done in O(n3) time (where n is length of input).

Page 45: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 45

Some Parsing Issues

AccuracySpeedSpace requirementsRobustnessLearning

Processor

Knowledge

Text Parsed text

Page 46: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 46

Problems with Traditional Parsers

Bad coverage

Brittleness

Slowness

Too many trees!

Page 47: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 47

Problems with Traditional Parsers

Correct lowlevel parses are often rejected because they do not fit into a global parse -> brittleness

Ambiguity -> indeterminism -> search -> slow parsers

Ambiguity -> sometimes hundreds of thousands of parse trees, and what can we do with these?

Page 48: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 48

Another strategy (Abney)

Start with the simplest constructions (’easy-first parsing’) and be as careful as possible when parsing them -> ’islands of certainty’

’islands of certainty’ -> do not reject these parses even if they do not fit into a global parse -> robustness

When you are almost sure of how to resolve an ambiguity, do it! -> determinism

When you are uncertain of how to resolve an ambiguity, don’t even try! -> ’containment of ambiguity’ -> determinism

determinism -> no search -> speed

Page 49: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 49

Shallow Parsers

Works on Part-of-Speech tagged data Analyses less complete than conventional parser

output Identifies some phrasal constituents (e.g. NPs),

without indicating their internal structure and their function in the sentence.

or identifies the functional role of some of the words, such as the main verb, and its direct arguments.

Page 50: Part-of-Speech Tagging

NLP1 - Torbjörn Lager 50

Deterministic bottom-up parsing

Adapted from Karttunen 1996:

define NP [(d) a* n+] ;regex NP @-> “[NP” ... “]”

.o. v “[NP” NP “]” @-> “[VP” ... “]” ;

apply down dannvaan[NP dann][VP v [NP aan]]

Note the use of the longest-match operator!