part-of-speech tagging

Part-of-Speech Tagging

Torbjörn LagerDepartment of LinguisticsStockholm University

NLP1 - Torbjörn Lager 2

Part-of-Speech Tagging: Definition

From Jurafsky & Martin 2000:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus.

The input to a tagging algorithm is a string of words and a specified tagset. The output is a single best tag for each word.

A bit too narrow for my taste...


Part-of-Speech Tagging: Example 1

Input

He can can a can

Output

He/pron can/aux can/vb a/det can/n

Another possible output

He/{pron} can/{aux,n} can/{vb} a/{det} can/{n,vb}


Tag Sets

The Penn Treebank tag set (see appendix in handout)


Why Part-of-Speech Tagging?

A first step towards parsing

A first step towards word sense disambiguation

Provide clues to pronounciation "object" -> OBject or obJECT (but note: BAnan vs baNAN)

Research in Corpus Linguistics



I can light a fire and you can open a can of beans. Now the can is open and we can eat in the light of the fire.


Relevant Information

Lexical informationLocal contextual information




I/PRP can/MD light/VB a/DT fire/NN and/CC you/PRP can/MD open/VB a/DT can/NN of/IN beans/NNS ./. Now/RB the/DT can/NN is/VBZ open/JJ and/CC we/PRP can/MD eat/VB in/IN the/DT light/NN of/IN the/DT fire/NN ./.


Part-of-Speech Tagging

Processor

Knowledge

Text POS tagged text

Needed:- Some strategy for representing the knowledge- Some method for acquiring the knowledge- Some method of applying the knowledge


Approaches to PoS Tagging

The bold approach: 'Use all the information you have and guess"'

The whimsical approach: 'Guess first, then change your mind if nessessary!'

The cautious approach: 'Don't guess, just eliminate the impossible!'


Some POS-Tagging Issues

AccuracySpeedSpace requirementsLearningIntelligibility

Processor

Knowledge

Text POS tagged text


Cutting the Cake

Tagging methods Rule based Statistical Mixed Other methods

Learning methods Supervised learning Unsupervised learning


HMM Tagging

The bold approach: 'Use all the information you have and guess"'

Statistical methodSupervised (or unsupervised) learning


The Naive Approach and its Problem

Traverse all the paths compatible with the input and then pick the most probable one

Problem: There are 27 paths in the HMM for S="he can

can a can" Doubling the length of S (with a conjunction in

between) -> 729 paths Doubling S again -> 531431 paths! Exponential time complexity!


Solution

Use the Viterbi algorithm

Tagging can be done in time proportional to the length of input.

How and Why does the Viterbi algorithm work? We save this for later...


Training an HMM

Estimate probabilities from relative frequencies. Output probabilities P(w|t): the number of occurrences of w

tagged as t, divided by the number of occurrences of t. Transitional probabilities P(t1|t2): the number of occurrences

of t1 followed by t2, divided by the number of occurrences of t2.

Use smoothing to overcome the sparse data problem (unknown words, uncommon words, uncommon contexts)


Transformation-Based Learning

The whimsical approach: 'Guess first, then change your mind if nessessary!'

Rule based tagging, statistical learningSupervised learningMethod due to Eric Brill (1995)


A Small PoS Tagging Example

rules

tag:NN>VB <- tag:TO@[-1] otag:VB>NN <- tag:DT@[-1] o....

inputShe decided to table her data

lexicondata:NN

decided:VB

her:PN

she:PN

table:NN VB

to:TO

NP VB TO NN PN NN


Lexicon for Brill Tagging

I PRPNow RBa DTand CCbeans NNScan MDeat VBfire NN VB

in INis VBZlight NN JJ VBof INopen JJ VBthe DTwe PRPyou PRP. .


A Rule Sequence

tag:'NN'>'VB' <- tag:'TO'@[-1] otag:'VBP'>'VB' <- tag:'MD'@[-1,-2,-3] otag:'NN'>'VB' <- tag:'MD'@[-1,-2] otag:'VB'>'NN' <- tag:'DT'@[-1,-2] otag:'VBD'>'VBN' <- tag:'VBZ'@[-1,-2,-3] otag:'VBN'>'VBD' <- tag:'PRP'@[-1] otag:'POS'>'VBZ' <- tag:'PRP'@[-1] otag:'VB'>'VBP' <- tag:'NNS'@[-1] otag:'IN'>'RB' <- wd:as@[0] & wd:as@[2] otag:'IN'>'WDT' <- tag:'VB'@[1,2] otag:'VB'>'VBP' <- tag:'PRP'@[-1] otag:'IN'>'WDT' <- tag:'VBZ'@[1] o...


Transformation-Based Painting

blue

green

red

brown

brown

yellow

blue blue

blue

red

K. Samuel 1998



Learner

Tagged Corpus

Rules

I nitial Corpus

Templates

Hand Coded Corpus



see appendix in handout


Constraint-Grammar Tagging

Due to Fred Karlsson et al.The cautious approach: 'Don't guess, just

eliminate the impossible!'Rule basedNo learning ('learning by injection')


Constraint Grammar Example


I/{PRP} can/{MD,NN} light/{JJ,NN,VB} a/{DT} fire/{NN} and/{CC} you/{PRP} can/{MD,NN} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ,VB} and/{CC} we/{PRP} can/{MD,NN} eat/{VB} in/{IN} the/{DT} light/{JJ,NN,VB} of/{IN} the/{DT} fire/{NN} ./{.}



tag:red 'RP' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NN'@[-1] otag:red 'VB' <- tag:'DT'@[-1] otag:red 'NP' <- wd:'The'@[0] otag:red 'VBN' <- wd:said@[0] otag:red 'VBP' <- tag:'TO'@[-1,-2] otag:red 'VBP' <- tag:'MD'@[-1,-2,-3] otag:red 'VBZ' <- wd:'\'s'@[0] & tag:'NN'@[1] otag:red 'RP' <- wd:in@[0] & tag:'NNS'@[-1] otag:red 'RB' <- wd:in@[0] & tag:'NNS'@[-1] o...




I/{PP} can/{MD} light/{JJ,VB} a/{DT} fire/{NN} and/{CC} you/{PP} can/{MD} open/{JJ,VB} a/{DT} can/{MD,NN} of/{IN} beans/{NNS} ./{.} Now/{RB} the/{DT} can/{MD,NN} is/{VBZ} open/{JJ} and/{CC} we/{PP} can/{MD} eat/{VB} in/{IN} the/{DT} light/{NN} of/{IN} the/{DT} fire/{NN} ./{.}


Evaluation

Two reasons for evaluating: Compare with other peoples methods/systems Compare with earlier versions of your own system

Accuracy (recall and precision)

Baseline

Ceiling

N-fold cross-validation methodology => Good use of the data + More statistically reliable results.


Assessing the Taggers

Accuracy

Speed

Space requirements

Learning

Intelligibility


Demo Taggers

Transformation-Based Tagger: www.ling.gu.se/~lager/Home/brilltagger_ui.html

Constraint-Grammar Tagger www.ling.gu.se/~lager/Home/cgtagger_ui.html

Featuring tracing facilities!Try it yourself!

Parsing

Torbjörn LagerDepartment of LinguisticsStockholm University


Parsing

Parsing with a phrase structure grammarShallow parsing


A Simple Phrase Structure Grammar

Fragment

lisa springer lisa skjuter en älg

Grammar

s --> np, vp.

np --> pn.np --> det, n.

vp --> v.vp --> v, np.

pn --> [kalle].pn --> [lisa].

det --> [en].

n --> [älg].

v --> [springer].v --> [skjuter].


Recognition and Parsing

Recognition

?- s([lisa,springer],[]).yes?- s([springer,lisa],[]).no

Parsing

?- s(Tree,[lisa,springer],[]).Tree = s(np(pn(lisa)),vp(v(springer)))


A Top-Down Parser in Prolog

parse(A,P0,P,A/Trees) :-(A --> B),parse(B,P0,P,Trees).

parse((B,Bs),P0,P,(Tree,Trees)) :- parse(B,P0,P1,Tree),parse(Bs,P1,P,Trees).

parse([Word],[Word|P],P,Word).


Trying It Out

s --> np, vp. det --> [en].np --> pn. n --> [älg].np --> det, n. tv --> [skjuter].vp --> v, np. pn --> [lisa].

? - parse(s,[lisa,skjuter,en,älg],[],Tree). Tree = s/(np/pn/lisa,vp/(v/skjuter,np/(det/en,n/älg)))


The Resulting Tree

Tree = s/ np/ pn/lisa, vp/ v/skjuter, np/ det/en, n/älg


Syntactic Ambiguity

Den gamla damen träffade killen med handväskan

John saw a man in the park with a telescope

Råttan åt upp osten och hunden och katten jagade råttan


Local Ambiguity

The old man the boats

The horse raced past the barn fell


Indeterminism and Search

A depth-first, top-down, left-to-right, backtracking parser can handle (both forms) of ambiguity.

Parsing as a form of search


A Problem

Left-recursive rules

np --> np, pp.np --> np, conj, np.

Indirect left-recursion

A --> B, C.B --> A, D.


Another Problem

s --> np, vp.vp --> v, np.vp --> v, np, pp.vp --> v, np, vp....

Ex: John saw the man talk with the actress

Parsing is exponential in the worst case!


Solution

Use a table (chart) in which parsed constituents are stored. No constituent is added to the chart which is already in it.

Parsing can be done in O(n3) time (where n is length of input).


Some Parsing Issues

AccuracySpeedSpace requirementsRobustnessLearning

Processor

Knowledge

Text Parsed text


Problems with Traditional Parsers

Bad coverage

Brittleness

Slowness

Too many trees!


Problems with Traditional Parsers

Correct lowlevel parses are often rejected because they do not fit into a global parse -> brittleness

Ambiguity -> indeterminism -> search -> slow parsers

Ambiguity -> sometimes hundreds of thousands of parse trees, and what can we do with these?


Another strategy (Abney)

Start with the simplest constructions (’easy-first parsing’) and be as careful as possible when parsing them -> ’islands of certainty’

’islands of certainty’ -> do not reject these parses even if they do not fit into a global parse -> robustness

When you are almost sure of how to resolve an ambiguity, do it! -> determinism

When you are uncertain of how to resolve an ambiguity, don’t even try! -> ’containment of ambiguity’ -> determinism

determinism -> no search -> speed


Shallow Parsers

Works on Part-of-Speech tagged data Analyses less complete than conventional parser

output Identifies some phrasal constituents (e.g. NPs),

without indicating their internal structure and their function in the sentence.

or identifies the functional role of some of the words, such as the main verb, and its direct arguments.


Deterministic bottom-up parsing

Adapted from Karttunen 1996:

define NP [(d) a* n+] ;regex NP @-> “[NP” ... “]”

.o. v “[NP” NP “]” @-> “[VP” ... “]” ;

apply down dannvaan[NP dann][VP v [NP aan]]

Note the use of the longest-match operator!

part-of-speech tagging

Documents

nlp1 torbjrn lagertraining

nlp1 torbjrn lagersome

handoutnlp1 torbjrn

tagging algorithm

number of occurrences

cautious approach

pos taggingthe bold

length of s