part of speech tagging - the university of edinburgh · hmm part-of-speech tagging part of speech...

53
Automatic POS Tagging HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University of Edinburgh 21 October 2011 Informatics 2A: Lecture 15 Part of Speech Tagging 1

Upload: hoangtruc

Post on 29-Aug-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Part of Speech TaggingInformatics 2A: Lecture 15

Mirella Lapata

School of InformaticsUniversity of Edinburgh

21 October 2011

Informatics 2A: Lecture 15 Part of Speech Tagging 1

Page 2: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

1 Automatic POS TaggingMotivationCorpus AnnotationTags and Tokens

2 HMM Part-of-Speech Tagging

Informatics 2A: Lecture 15 Part of Speech Tagging 2

Page 3: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Benefits of Part of Speech Tagging

Can help in determining authorship. Are any twodocuments written by the same person ⇒ forensic linguistics.

Can help in speech synthesis and recognition. Forexample, say the following out-loud

1 Have you read ’The Wind in the Willows’? (noun)

2 The clock has stopped. Please wind it up. (verb)

3 The students tried to protest. (verb)

4 The students are pleased that their protest was successful.(noun)

Informatics 2A: Lecture 15 Part of Speech Tagging 3

Page 4: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Corpus Annotation

Annotation: adds information that is not explicit in a corpus,increases its usefulness (often application-specific).

To annotate a coprus with Part-of-Speech (POS) classes we mustdefine a tag set – the inventory of labels for marking up a corpus.

Example: part of speech tag sets

1 CLAWS tag (used for BNC); 62 tags;

2 Brown tag (used for Brown corpus); 87 tags;

3 Penn tag set (used for the Penn Treebank); 45 tags.

Informatics 2A: Lecture 15 Part of Speech Tagging 4

Page 5: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

POS Tag Sets for English

Category Examples CLAWS Brown PennAdjective happy, bad AJ0 JJ JJNoun singular woman, book NN1 NN NNNoun plural women, books NN2 NN NNNoun proper singular London, Michael NP0 NP NNPNoun proper plural Finns, Hearts NP0 NPS NNPSreflexive pro itself, ourselves PNXplural reflexive pro ourselves, . . . PPLSVerb past participle given, found VVN VBN VBNVerb base form give, make VVB VB VBVerb simple past ate, gave VVD VBD VBD

All words must be assigned at least one tag. Differences in tagsreflects what distinctions are/aren’t drawn.

Informatics 2A: Lecture 15 Part of Speech Tagging 5

Page 6: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

POS Tag Sets for English

Category Examples CLAWS Brown PennAdjective happy, bad AJ0 JJ JJNoun singular woman, book NN1 NN NNNoun plural women, books NN2 NN NNNoun proper singular London, Michael NP0 NP NNPNoun proper plural Finns, Hearts NP0 NPS NNPSreflexive pro itself, ourselves PNXplural reflexive pro ourselves, . . . PPLSVerb past participle given, found VVN VBN VBNVerb base form give, make VVB VB VBVerb simple past ate, gave VVD VBD VBD

All words must be assigned at least one tag. Differences in tagsreflects what distinctions are/aren’t drawn.

Informatics 2A: Lecture 15 Part of Speech Tagging 5

Page 7: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Tags and Tokens

In POS-tagged corpora tokens and their POS-tags are usuallygiven in the form text/tag:

Our/PRP\$ enemies/NNS are/VBP innovative/JJ and/CCresourceful/JJ ,/, and/CC so/RB are/VB we/PRP ./.They/PRP never/RB stop/VB thinking/VBG about/IN new/JJways/NNS to/TO harm/VB our/PRP\$ country/NN and/CCour/PRP\$ people/NN, and/CC neither/DT do/VB we/PRP

Informatics 2A: Lecture 15 Part of Speech Tagging 6

Page 8: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Extent of POS Ambiguity

POS-tagging a large corpus by hand is a lot of work.

We’d prefer to automate but how hard can it be?

Many words may appear in several categories.

But most words appear most of the time in one category.

POS Ambiguity in the Brown corpus

Brown corpus (1M words) has 39,440 different word types:

35340 have only 1 POS tag anywhere in corpus (89.6%)

4100 (10.4%) have 2–7 POS tags

Why does 10.4% POS-tag ambiguity by word type lead to difficulty?

Informatics 2A: Lecture 15 Part of Speech Tagging 7

Page 9: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Extent of POS Ambiguity

Words in a large corpus have a Zipfian distribution.

Many high frequency words have more than one POS tag.

More than 40% of the word tokens are ambiguous.

He wants to/TO go.He went to/IN the store.

He wants that/DT hat.It is obvious that/CS he wants a hat.He wants a hat that/WPS fits.

How about guessing the most common tag for each word?Will give you 90% accuracy (state of-the-art is 96–98%).

Informatics 2A: Lecture 15 Part of Speech Tagging 8

Page 10: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Extent of POS Ambiguity

Words in a large corpus have a Zipfian distribution.

Many high frequency words have more than one POS tag.

More than 40% of the word tokens are ambiguous.

He wants to/TO go.He went to/IN the store.

He wants that/DT hat.It is obvious that/CS he wants a hat.He wants a hat that/WPS fits.

How about guessing the most common tag for each word?

Will give you 90% accuracy (state of-the-art is 96–98%).

Informatics 2A: Lecture 15 Part of Speech Tagging 8

Page 11: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Extent of POS Ambiguity

Words in a large corpus have a Zipfian distribution.

Many high frequency words have more than one POS tag.

More than 40% of the word tokens are ambiguous.

He wants to/TO go.He went to/IN the store.

He wants that/DT hat.It is obvious that/CS he wants a hat.He wants a hat that/WPS fits.

How about guessing the most common tag for each word?Will give you 90% accuracy (state of-the-art is 96–98%).

Informatics 2A: Lecture 15 Part of Speech Tagging 8

Page 12: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

MotivationCorpus AnnotationTags and Tokens

Clicker Question

What is the difference between word types and tokens?

1 Word types are part of speech tags, tokens are just the words.

2 Word types are the number of times words appear in thecorpus, whereas word tokens are unique occurrences of wordsin the corpus.

3 Word types are the vocabulary (what different words arethere), whereas word tokens refer to the frequency of eachword type.

4 Word types and tokens are the same thing.

Informatics 2A: Lecture 15 Part of Speech Tagging 9

Page 13: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

Find the best sequence of tags that corresponds to:

Secrertariat is expected to race tomorrowNNP VBZ VBN TO VB NNNNP VBZ VBN TO NN NN

t̂n1 = argmax

tn1

P(tn1 |wn

1 )

= argmaxtn1

P(wn1 |tn

1 )P(tn1 )

P(wn1 ) using Bayes’ rule

= argmaxtn1

P(wn1 |tn

1 )P(tn1 ) denominator does not change

Informatics 2A: Lecture 15 Part of Speech Tagging 10

Page 14: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

Find the best sequence of tags that corresponds to:

Secrertariat is expected to race tomorrowNNP VBZ VBN TO VB NNNNP VBZ VBN TO NN NN

t̂n1 = argmax

tn1

P(tn1 |wn

1 )

= argmaxtn1

P(wn1 |tn

1 )P(tn1 )

P(wn1 ) using Bayes’ rule

= argmaxtn1

P(wn1 |tn

1 )P(tn1 ) denominator does not change

Informatics 2A: Lecture 15 Part of Speech Tagging 10

Page 15: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

Find the best sequence of tags that corresponds to:

Secrertariat is expected to race tomorrowNNP VBZ VBN TO VB NNNNP VBZ VBN TO NN NN

t̂n1 = argmax

tn1

P(tn1 |wn

1 )

= argmaxtn1

P(wn1 |tn

1 )P(tn1 )

P(wn1 ) using Bayes’ rule

= argmaxtn1

P(wn1 |tn

1 )P(tn1 ) denominator does not change

Informatics 2A: Lecture 15 Part of Speech Tagging 10

Page 16: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

Find the best sequence of tags that corresponds to:

Secrertariat is expected to race tomorrowNNP VBZ VBN TO VB NNNNP VBZ VBN TO NN NN

t̂n1 = argmax

tn1

P(tn1 |wn

1 )

= argmaxtn1

P(wn1 |tn

1 )P(tn1 )

P(wn1 ) using Bayes’ rule

= argmaxtn1

P(wn1 |tn

1 )P(tn1 ) denominator does not change

Informatics 2A: Lecture 15 Part of Speech Tagging 10

Page 17: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

Find the best sequence of tags that corresponds to:

Secrertariat is expected to race tomorrowNNP VBZ VBN TO VB NNNNP VBZ VBN TO NN NN

t̂n1 = argmax

tn1

P(tn1 |wn

1 )

= argmaxtn1

P(wn1 |tn

1 )P(tn1 )

P(wn1 ) using Bayes’ rule

= argmaxtn1

P(wn1 |tn

1 )P(tn1 ) denominator does not change

Informatics 2A: Lecture 15 Part of Speech Tagging 10

Page 18: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 = argmax

tn1

P(wn1 |tn

1 ) P(tn1 )

≈n∏

i=1P(wi |ti )

n∏i=1

P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸likelihood prior

P(wn1 |tn

1 ) ≈n∏

i=1

P(wi |ti )

P(tn1 ) ≈

n∏i=1

P(ti |ti−1)

Informatics 2A: Lecture 15 Part of Speech Tagging 11

Page 19: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 = argmax

tn1

P(wn1 |tn

1 ) P(tn1 )

≈n∏

i=1P(wi |ti )

n∏i=1

P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸likelihood prior

P(wn1 |tn

1 ) ≈n∏

i=1

P(wi |ti )

P(tn1 ) ≈

n∏i=1

P(ti |ti−1)

Informatics 2A: Lecture 15 Part of Speech Tagging 11

Page 20: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 = argmax

tn1

P(wn1 |tn

1 ) P(tn1 )

≈n∏

i=1P(wi |ti )

n∏i=1

P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸likelihood prior

P(wn1 |tn

1 ) ≈n∏

i=1

P(wi |ti )

P(tn1 ) ≈

n∏i=1

P(ti |ti−1)

Informatics 2A: Lecture 15 Part of Speech Tagging 11

Page 21: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 = argmax

tn1

P(wn1 |tn

1 ) P(tn1 ) ≈

n∏i=1

P(wi |ti )n∏

i=1P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸likelihood prior

P(wn1 |tn

1 ) ≈n∏

i=1

P(wi |ti )

P(tn1 ) ≈

n∏i=1

P(ti |ti−1)

Informatics 2A: Lecture 15 Part of Speech Tagging 11

Page 22: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 ≈ argmax

tn1

n∏i=1

P(wi |ti )n∏

i=1P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸emission probability transition probability

P(wi |ti ) = C(ti ,wi )C(ti )

P(ti |ti−1) =C(ti ,ti−1)C(ti−1)

P(is|VBZ ) = C(VBZ ,is)C(VBZ) = 10,073

21,627 = .47

P(NN|DT ) = C(DT ,NN)C(DT ) = 56,509

116,454 = .49

Informatics 2A: Lecture 15 Part of Speech Tagging 12

Page 23: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 ≈ argmax

tn1

n∏i=1

P(wi |ti )n∏

i=1P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸emission probability transition probability

P(wi |ti ) = C(ti ,wi )C(ti )

P(ti |ti−1) =C(ti ,ti−1)C(ti−1)

P(is|VBZ ) = C(VBZ ,is)C(VBZ) = 10,073

21,627 = .47

P(NN|DT ) = C(DT ,NN)C(DT ) = 56,509

116,454 = .49

Informatics 2A: Lecture 15 Part of Speech Tagging 12

Page 24: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 ≈ argmax

tn1

n∏i=1

P(wi |ti )n∏

i=1P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸emission probability transition probability

P(wi |ti ) = C(ti ,wi )C(ti )

P(ti |ti−1) =C(ti ,ti−1)C(ti−1)

P(is|VBZ ) = C(VBZ ,is)C(VBZ) = 10,073

21,627 = .47

P(NN|DT ) = C(DT ,NN)C(DT ) = 56,509

116,454 = .49

Informatics 2A: Lecture 15 Part of Speech Tagging 12

Page 25: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Sequence Labeling

t̂n1 ≈ argmax

tn1

n∏i=1

P(wi |ti )n∏

i=1P(ti |ti−1)

︸ ︷︷ ︸ ︸ ︷︷ ︸emission probability transition probability

P(wi |ti ) = C(ti ,wi )C(ti )

P(ti |ti−1) =C(ti ,ti−1)C(ti−1)

P(is|VBZ ) = C(VBZ ,is)C(VBZ) = 10,073

21,627 = .47

P(NN|DT ) = C(DT ,NN)C(DT ) = 56,509

116,454 = .49

Informatics 2A: Lecture 15 Part of Speech Tagging 12

Page 26: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Hidden Markov Models

A finite automaton is defined by set of states and set oftransitions between states according to input observations

A weighted finite automaton has probabilities or weights onthe arcs

In a Markov chain the input sequence uniquely determineswhich states the automaton will go through.

In a Hidden Markov model the sequence of states giveninput is hidden, i.e., ambiguous.

In POS-tagging, we observe the input words but not thePOS-tags themselves.

Informatics 2A: Lecture 15 Part of Speech Tagging 13

Page 27: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Definition of Hidden Markov Models

Q = q1, q2 . . . qN A set of N statesA = a11a12 . . . an1 . . . ann a transition probability matrix A,

each aij represents the probability ofmoving from state i to state j , s.t.n∑

j=1aij = 1 ∀i

O = o1, o2 . . . oT sequence of T observations drawnfrom vocabulary V = v1, v2 . . . vV .

B = bi (oT ) Sequence of emission probabilitiesexpressing probability of ot being gen-erated from state i .

q0, qF a start state and final state.

Informatics 2A: Lecture 15 Part of Speech Tagging 14

Page 28: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Transition Probabilities

Informatics 2A: Lecture 15 Part of Speech Tagging 15

Page 29: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Emission Probabilities

Informatics 2A: Lecture 15 Part of Speech Tagging 16

Page 30: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Transition and Emission Probabilities

VB TO NN PPPS<s> .019 .0043 .041 .67VB .0038 .035 .047 .0070TO .83 0 .000 0NN .0040 .016 .087 .0045PPPS .23 .00079 .001 .00014

I want to raceVB 0 .0093 0 .00012TO 0 0 .99 0BB 0 .000054 0 .00057PPSS .37 0 0 0

Informatics 2A: Lecture 15 Part of Speech Tagging 17

Page 31: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

How Do we Search for Best Tag Sequence?

We have defined an HMM, but how do we use it? We are given aword sequence and must find their corresponding tag sequence.

It is easy to compute the probability of a specific tag sequence:

t̂n1 ≈

n∏i=1

P(wi |ti )n∏

i=1

P(ti |ti−1)

But how do we find most likely tag sequence?

We can do this efficiently using dynamic programming andthe Viterbi algorithm.

Informatics 2A: Lecture 15 Part of Speech Tagging 18

Page 32: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Clicker Question

Given n words and on average T choices, how many tag sequencesdo we have to evaluate?

1 |T | tag sequences

2 n tag sequences

3 |T | × n tag sequences

4 |T |n tag sequences

Informatics 2A: Lecture 15 Part of Speech Tagging 19

Page 33: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 34: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 35: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 36: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 37: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 38: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 39: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 40: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 41: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 42: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 43: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 44: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 45: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 46: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The HMM Trellis

NN

TO

VB

PPSS

NN

TO

VB

NN

TO

VB

NN

TO

VB

PPSS

PPSS

PPSS

START

I want to race

Informatics 2A: Lecture 15 Part of Speech Tagging 20

Page 47: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0q3 TO 0q2 VB 0q1 PPSS 0qo start 1.0

<s> I want to raceoo o1 o2 o3 o4

1 Create probability matrix, with one column for eachobservation (i.e., word), and one row for each state (i.e., tag).

2 We proceed by filling cells, column by column

Informatics 2A: Lecture 15 Part of Speech Tagging 21

Page 48: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0 1.0× .041× 0

q3 TO 0 1.0× .0043× 0

q2 VB 0 1.0× .19× 0

q1 PPSS 0 1.0× .67× .37

qo start 1.0

<s> I want to raceoo o1 o2 o3 o4

For each state qj at time t compute

vt(j) =N

maxi=j

vt−1(i)aijbj(ot)

vt−1(i) is previous Viterbi path probability, aij is transitionprobability, and bj(ot) is emission probability

Informatics 2A: Lecture 15 Part of Speech Tagging 22

Page 49: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0 0 .025× .0012× 0.000054

q3 TO 0 0 .025× .00079× 0

q2 VB 0 0 .025× .23× .0093

q1 PPSS 0 .025 .025× .00014× 0

q0 start 1.0

<s> I want to raceoo o1 o2 o3 o4

For each state qj at time t compute

vt(j) =N

maxi=j

vt−1(i)aijbj(ot)

vt−1(i) is previous Viterbi path probability, aij is transitionprobability, and bj(ot) is state observation likelihood

Informatics 2A: Lecture 15 Part of Speech Tagging 23

Page 50: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0 0 .000000002 .000053× .047× 0

q3 TO 0 0 0 .000053× .035× .99

q2 VB 0 0 .00053 .000053× .0038× 0

q1 PPSS 0 .025 0 .000053× .0070× 0

q0 start 1.0

<s> I want to raceoo o1 o2 o3 o4

For each state qj at time t compute

vt(j) =N

maxi=j

vt−1(i)aijbj(ot)

vt−1(i) is previous Viterbi path probability, aij is transitionprobability, and bj(ot) is state observation likelihood

Informatics 2A: Lecture 15 Part of Speech Tagging 24

Page 51: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0 0 .0000000020 .0000018× .00047× .00057

q3 TO 0 0 0 .0000018.0000018×0×0

q2 VB 0 0 .00053 0 .0000018×.83×.00012

q1 PPSS0 .025 0 0 .0000018× 0× 0

q0 start 1.0

<s> I want to raceoo o1 o2 o3 o4

For each state qj at time t compute

vt(j) =N

maxi=j

vt−1(i)aijbj(ot)

vt−1(i) is previous Viterbi path probability, aij is transitionprobability, and bj(ot) is state observation likelihood

Informatics 2A: Lecture 15 Part of Speech Tagging 25

Page 52: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

The Viterbi Algorithm

qend endq4 NN 0 0 .000000002 0 4.8222e-13

q3 TO 0 0 0 .0000018 0

q2 VB 0 0 .00053 0 1.7928e-10

q1 PPSS 0 .025 0 0 0

q0 start 1.0

<s> I want to raceoo o1 o2 o3 o4

For each state qj at time t compute

vt(j) =N

maxi=j

vt−1(i)aijbj(ot)

vt−1(i) is previous Viterbi path probability, aij is transitionprobability, and bj(ot) is state observation likelihood

Informatics 2A: Lecture 15 Part of Speech Tagging 26

Page 53: Part of Speech Tagging - The University of Edinburgh · HMM Part-of-Speech Tagging Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics University

Automatic POS TaggingHMM Part-of-Speech Tagging

Summary

A number of POS tag sets exist for English (e.g. Brown,CLAWS, Penn).

Automatic POS tagging makes errors because many highfrequency words are part-of-speech ambiguous.

POS-tagging can be performed automatically using HiddenMarkov Models.

Reading: J&M (2nd edition) Chapter 5NLTK Book: Chapter 5, Categorizingand Tagging Words

Next lecture: Phrase structure and parsing as search

Informatics 2A: Lecture 15 Part of Speech Tagging 27