part-of-speech tagging - au portals · 2020. 12. 23. · part-of-speech tagging (or justtagging for...

39
Part-of-Speech Tagging Part-of-speech tagging, - Rule-based tagging, - Statistical model tagging, - Transformation-based tagging, (Mostly) English Word Classes, - Closed class, Open class, - Noun, proper/ common nouns, - Verb, Adjectives, adverbs Tagsets for English, - Penn Treebank part-of-speech tags, Part-of Speech Tagging, - Tagging, Ambiguous, - Brown corpus, Rule-based Part-of-speech Tagging, - first stage used a dictionary, - second stage used large lists of hand- written disambiguation rules, HMM Part-of-Speech Tagging, - Prior probability, - likelihood of tag sequence, - Computing the Most likely Tag sequence: An Example, - Formalizing Hidden Markov Model Taggers, Transformation-based Tagging, - Transformation-based learning, - How TBL Rules are Applied, @Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Upload: others

Post on 26-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Part-of-Speech Tagging

Part-of-speech tagging,

- Rule-based tagging,

- Statistical model tagging,

- Transformation-based tagging,

(Mostly) English Word Classes,

- Closed class, Open class,

- Noun, proper/ common nouns,

- Verb, Adjectives, adverbs

Tagsets for English,

- Penn Treebank part-of-speech tags,

Part-of Speech Tagging,

- Tagging, Ambiguous,

- Brown corpus,

Rule-based Part-of-speech Tagging,

- first stage used a dictionary,

- second stage used large lists of hand-written disambiguation rules,

HMM Part-of-Speech Tagging,

- Prior probability,

- likelihood of tag sequence,

- Computing the Most likely Tag

sequence: An Example,

- Formalizing Hidden Markov Model

Taggers,

Transformation-based Tagging,

- Transformation-based learning,

- How TBL Rules are Applied,

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 2: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

A description of 8 parts-of-speech as;

- Noun: a word (other than a pronoun) used to identify any of a class of people, places, or things ( common noun ). e.g.,

- Verb: a word used to describe an action, state, or occurrence, and forming the main part of the predicate of a sentence, e.g., hear, become, happen.

- Pronoun: a word that can function as a noun phrase used by itself and that refers either to the participants in the discourse (e.g., I, you, she, it, this )

- Preposition: Prepositions are usually used in front of nouns or pronouns and they show the relationship between the noun or pronoun and other words in a sentence. e.g., after, in, to, on, and with.

- Adverb: a word or phrase that modifies the meaning of an adjective, verb, or other adverb, expressing manner, place, time, or degree (e.g., gently, here, now, very ).

1. Part-of-Speech Tagging (Some Concepts)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 3: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

A description of 8 parts-of-speech as;

- conjunction: a word used to connect clauses or sentences or to coordinate words in the same clause (e.g., and, but, if ).

- participle: a word formed from a verb (e.g., going, gone, being, been ) and used as an adjective (e.g., working woman, burnt toast ) or a noun (e.g., good breeding ).

In English; participles are also used to make compound verb forms (e.g., is going, has been ).

- Article: Articles are words that define a noun as specific or unspecific. Consider the following examples:

Example- 1: After the long day, the cup of tea tasted particularly good.

Example-2: After a long day, a cup of tea tastes particularly good.

1. Part-of-Speech Tagging (Some Concepts) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 4: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

More recent lists of parts-of-speech (or tagsets) have many more words classes;

- 45 for the Penn Treebank (Marcus et al., 1993).

- 87 for the Brown corpus (Francis, 1979).

- 146 for the C7 tagset (Garside et al., 1997).

The significance of Parts-of-speech (POS) or tagsets includes;

- large amount of information that give about a word and its neighbor information.

For Example; tagsets distinguish between possessive pronouns ( my, your, his, her, its) and personal pronouns (I, you, he, me).

- Knowing whether a word is possessive pronoun or a personal pronoun can tell us what words are likely to occur in its vicinity.

1. Part-of-Speech Tagging (Some Concepts) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 5: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Part-of-speech tagging (or just tagging for short) is the process tagging of assigning a part-of-speech or other syntactic class marker to each word in a corpus.

Because tags are generally also applied to punctuation, tagging requires that the punctuation marks (period, comma, etc) be separated off of the words.

Computational algorithms for assigning parts-of-speech to words (part-of-speech

tagging) divided into three algorithms:

1- Hand-written rules (rule-based tagging),

2- Statistical methods (HMM tagging and maximum entropy tagging),

3. Transformation-based tagging and memory-based tagging.

Rule-based taggers generally involve a large database of handwritten disambiguation

rules which specify,

1. Part-of-Speech Tagging (Some Concepts) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 6: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Parts-of-speech can be divided into two broad supercategories;

(1) Closed class type:

- Closed class words are also generally function words

- which tend to be very short, occur frequently, and often have structuring uses in grammar.

e.g., of, it, and, or, you.

(2) Open Class type:

- 4 major open classes occur in the languages of the world: nouns, verbs, adjectives, and adverbs.

(a) Noun : is the name given to the synthetic class

- Grouped into (i) proper nouns and (ii) common nouns.

(i) Proper nouns like Regina, Colorodo, and IBM.

2. English Word Classes

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 7: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(ii) Common nouns are divided into;

1) Count nouns are those that allow grammatical enumeration; that is;

- they can occur in both the singular and plural

- e.g., ( goat/goats, relationship/relationships) and they can be counted ( one goat, two goat).

2) Mass nouns are used when something is conceptualized as a homogenous group.

- e.g., works like snow, salt, and water are not counted (i.e., *two snows or * two water)

2. English Word Classes (Cont…)

Page 8: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(b) Verb : includes most of the words referring to actions and processes.

- English verbs have a number of

(i) Morphological forms (non-third-person-sg (eat))

(ii) Third-person-sg (third-person-sg (eats))

(iii) Progressive (progressive (eating))

(iv) Past participle (eaten)

(c) Adjectives : includes many terms that describe properties or qualities.

- e.g., color (white, black), age (old, young), and value (good, bad).

2. English Word Classes (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 9: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(d) The final open class form, is rather a hodge-podge, both semantically and formally.

For example; all the italicized words are adverbs:

Unfortunately, John walked home extremely slowly yesterday.

(i) Directional adverbs or locative adverbs;

- Specify the direction or location of some action. e.g., home, downhill, etc.

(ii) Degree adverbs;

- Specify the extent of some action, process, or property. e.g., extremely, very, somewhat, etc.

(iii) Manner adverbs;

- Describe the manner of some action or process. e.g., slowly, delicately, etc.

(iv) Temporal adverbs;

- Describe the time that some action or event took place. e.g., yesterday, monday, etc.

2. English Word Classes (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 10: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Most of the popular tagsets for English;

- 87-tag tagset used for the Brown corpus.

Two of the most commonly used targets are;

- small 45-tag Penn Treebank tagset.

- medium-sized 61-tag C5 tagset.

Example; - Some examples of tagged sentences from the Penn Treebank

version of the Brown corpus is;

a) The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

b) There/EX are/VBP 70/CD children/NNS there/RB

3. Tagsets for English

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 11: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

a) Although preliminary findings were reported more than a year ago, the latest results appear in today ‘s New England Journal of Medicine.

b) Mrs. Shaefer never got around to joining.

c) All we gotta do is go around the corner.

d) She told off her friends.

e) She stepped off the train.

f) They were married by the Justice of the Peace yesterday at 5:00.

3. Tagsets for English (Class Participation)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 12: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Part-of-speech tagging; - is the process of assigning a part of speech or other synthetic class marker to each word in a corpus.

Problem:

Book/VB that/DT flight/NN ./.

Does/VBZ that/DT flight/NN serve/VB dinner/NN ?/.

Book is ambiguous.

- That is, it has more than one possible usage and part-of-speech.

(i) It can be a verb ( as in book that flight or to book the suspect).

(ii) or a noun (as in hand me that book or a book of matches).

Solution:

The problem of POS-tagging is to resolve these ambiguities, choosing the proper tag for the context.

Upgrade version of POS-tagging is used as 87-tag Brown corpus tagset.

4. Part-of-Speech Tagging

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 13: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Example Go away!

He sometimes goes to the cafe.

All the cakes have gone.

We went on the excursion

4. Part-of-Speech Tagging (Brown Corpus Tags) (Cont…)

Figure: 87-tag Brown corpus tagset.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 14: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

My aunt’s can opener can open a drum should look like this:

The old car broke down in the car park

At least two men broke in and stole my TV

The horses were broken in and ridden in two weeks

Kim and Sandy both broke up with their partners

The horse which Kim sometimes rides is more bad tempered than mine

The horse as well as the rabbits which we wanted to eat has escaped

It was my aunt’s car which we sold at auction last year in February

The only rabbit that I ever liked was eaten by my parents one summer

The veterans who I thought that we would meet at the reunion were dead

4. Part-of-Speech Tagging (Penn Treebank tagset vs Brown Corpus Tags) (Class Participation)

Natural disasters – storms, flooding, hurricanes – occur infrequently but cause devastation that strains resources to breaking point

Letters delivered on time by old-fashioned means are increasingly rare, so it is as well that that is not the only option available

It won’t rain but there might be snow on high ground if the temperature stays about the same over the next 24 hours

The long and lonely road to redemption begins with self-reflection: the need to delve inwards to deconstruct layers of psychological obfuscation

My wildest dream is to build a POS tagger which processes 10K words per second and uses only 1MB of RAM, but it may prove too hard

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 15: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

• The earliest algorithms for automatically assigning part-of-speech were based

on a two stages architecture (Harris, 1962; Klein and Simmons).

(1) The first stage used a dictionary to assign each word a list of potential parts-

of-speech.

(2) The second stage used large lists of hand-written disambiguation rules to

narrow down this list to a single part-of-speech for each word.

5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 16: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(1) First stage: dictionary to assign each word

Choose the most likely tag for each ambiguous word, independent of

previous words.

- i.e., assign each token the POS category it occurred as most often in the

training set

- e.g., race – which POS is more likely in a corpus?

This strategy gives you 90% accuracy in controlled tests

- So, this “unigram baseline” must always be compared against

5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 17: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(1) First stage: dictionary to assign each word ( Example)

Which POS is more likely in a corpus (1,273,000 tokens)?

NN VB Total

race 400 600 1000

P(NN|race) = P(race&NN) / P(race) by the definition of conditional probability

- P(race) ≅ 1000/1,273,000 = .0008

- P(race&NN) ≅ 400/1,273,000 =.0003

- P(race&VB) ≅ 600/1,273,000 = .0005

And so we obtain:

- P(NN|race) = P(race&NN)/P(race) = .0003/.0008 =.375

- P(VB|race) = P(race&VB)/P(race) = .0004/.0008 = .625

5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 18: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(2) Second stage: hand-written disambiguation rules

Uses 56,000-word lexicon which lists parts-of-speech for each word (using two-level morphology)

Uses up to 3,744 rules, or constraints, for POS disambiguation.

ADV-that rule [sentence = it isn’t that old]

Given input “that” (ADV/PRON/DET/COMP)

If (+1 A/ADV/QUANT) #next word is adj, adverb, or quantifier

(+2 SENT_LIM) #and following word is a sentence boundary

(NOT -1 SVOC/A) #and the previous word is not a verb like

#consider which allows adjs as object complements

Then eliminate non-ADV tags

Else eliminate ADV tag

5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

Adj = attribute of a noun. e.g., sweet color, red car, sixteen candles.

ADV = modifies a verb. e.g., very tall, tooquickly.

QUANT = a determiner or pronoun indicativeof quantity. e.g., all people, both party.

PRON = a word that can function as a noun(e.g., I, you)

DET = acting as determiner. A modifying word that determines the kind of reference a noun or group has. (e.g., a person, the game, every moment)

COMP = acting as complement. A word which complete the meaning of an expression. (e.g., He is weak, He is old.)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 19: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Algorithm Description :-

The first two clauses of this rule check to see that the that directly precedes a

sentence-final adjective, adverb, or quantifier.

In all other cases the adverb reading is eliminated.

The last clause eliminates cases preceded by verbs like consider or believe

which can take a noun and an adjective; this is to avoid tagging the following

instance of that as an adverb:

5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 20: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

During Part-of-speech tagging, probability-based tagging play major rule instead

of rule-based tagging or hand written rule tagging.

Machines can learn from examples

- Learning can be supervised or unsupervised.

Given training data, machines analyze the data, and learn rules which generalize

to new examples.

- Can be sub-symbolic (rule may be a mathematical function) e.g., neural nets.

- Or it can be symbolic (rules are in a representation that is similar to

representation used for hand-coded rules).

In general, machine learning approaches allow for more tuning to the needs of a

corpus, and can be reused across corpora.

6. Statistical Tagging (HMM Part-of-speech tagging)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 21: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

In a classification task, we are given some observation(s) and our job is to determine which of a set of classes it belongs to.

Part-of-speech tagging is generally treated as a sequence classification task.

- the observation is a sequence of words (let’s say a sentence), and it is our job to assign them a sequence of part-of-speech tags.

For example, say we are given a sentence like

- “He will race”.

• What is the best sequence of tags which corresponds to this sequence of words?

- The Bayesian interpretation of this task starts by considering all possible sequences of classes in this case, all possible sequences of tags.

- Out of this universe of tag sequences, we want to choose the tag sequence which is most probable given the observation sequence of n words 𝑤1

𝑛.

6. Statistical Tagging (HMM Part-of-speech tagging) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 22: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

What you want to do is find the “best sequence” of POS tags T=T1..Tn for a sentence W=W1..Wn.

- (Here T1 is pos_tag(W1)).

find a sequence of POS tags T that maximizes P(T|W)

Using Bayes’ Rule, we can say

P(T|W) = P(W|T)*P(T)/P(W)

We want to find the value of T

which maximizes the RHS

=> denominator can be discarded

(same for every T)

=> Find T which maximizes

P(W|T) * P(T)

6. Statistical Tagging (HMM Part-of-speech tagging) [Example]

Example: He will race

Possible sequences:

- He/PRP will/MD race/NN

- He/PRP will/NN race/NN

- He/PRP will/MD race/VB

- He/PRP will/NN race/VB

W = W1 W2 W3 W4

= He will race

T = T1 T2 T3 T4

- Choices:

• T= PRP MD NN

• T= PRP NN NN

• T = PRP MD VB

• T = PRP NN VB

4 different probabilities sequences values for

“He will race”

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 23: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Assumption : Case 1:

Assume that current event is based only on previous n-1 events (for a bigram

model, it’s based only on previous 1 event)

P(T1….Tn) ≅ Πi=1, n P(Ti| Ti-1)

- assumes that the event of a POS tag occurring is independent of the event of any other POS tag occurring, except for the immediately previous POS tag.

=> From a linguistic standpoint, this seems an unreasonable assumption, due to

long-distance dependencies. {e.g., Ali and his friends (go or goes?????)}

Assumption : Case 2:

P(W1….Wn | T1….Tn) ≅ Πi=1, n P(Wi| Ti)

- assumes that the event of a word appearing in a category is independent of the event of any surrounding word or tag, except for the tag at this position.

6. Statistical Tagging (HMM Part-of-speech tagging) [Independence Assumptions]

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 24: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Linguists know both these assumptions are incorrect!

- But, nevertheless, statistical approaches based on these assumptions work

pretty well for part-of-speech tagging.

In particular, with Hidden Markov Models (HMMs)

- Very widely used in both POS-tagging and speech recognition, among

other problems.

- A Markov model, or Markov chain, is just a weighted Finite State

Automaton.

6. Statistical Tagging (HMM Part-of-speech tagging) [Independence Assumptions](Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 25: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

POS Tagging Based on Bigrams

Problem: Find T which maximizes P(W | T) * P(T)

- Here W=W1..Wn and T=T1..Tn

Using the bigram model, we get:

(a) Transition probabilities (prob. of transitioning from one state/tag to another):

• P(T1….Tn) ≅ Πi=1, n P(Ti|Ti-1)

(b) Emission probabilities (prob. of emitting a word at a given state):

• P(W1….Wn | T1….Tn) ≅ Πi=1, n P(Wi| Ti)

So, we want to find the value of T1..Tn which maximizes:

Πi=1, n P(Wi| Ti) * P(Ti| Ti-1)

6. Statistical Tagging (HMM Part-of-speech tagging)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 26: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

POS Tagging Based on Bigrams

(a) Transition probabilities P(T1….Tn) ≅Πi=1, n P(Ti|Ti-1)

Example: He will race

Choices for T=T1..T3

- T= PRP MD NN

- T= PRP NN NN

- T = PRP MD VB

- T = PRP NN VB

POS bigram probs from training corpus can be used for P(T)

P(PRP-MD-NN)=1*.8*.4 =.32

6. Statistical Tagging (HMM Part-of-speech tagging)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 27: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(a) Transition probabilities

From the training corpus, we need to find the Ti which maximizes

Πi=1, n P(Wi| Ti) * P(Ti| Ti-1)

So, we’ll need to factor the lexical generation (emission)

probabilities, somehow:

6. Statistical Tagging (HMM Part-of-speech tagging) (Cont…)

Choices for T=T1..T3

- T= PRP MD NN

- T= PRP NN NN

- T = PRP MD VB

- T = PRP NN VB

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 28: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(b) Adding Emission probabilities

6. Statistical Tagging (HMM Part-of-speech tagging) (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 29: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

HMM part-of-speech tagging contains two kinds of probabilities,

(a) Tag transition probabilities (b) Word likelihood probabilities

(a) Tag transition probabilities

Tag and Tag combination.

(The tag transition probabilities, P(ti|ti−1), represent the probability of a tag given the previous tag.)

For Example ( This/DT book/NN is interesting) In the 45-tag Treebank Brown corpus, the tag DT occurs 116,454 times. Of these, DT is followed by NN

56,509 Thus the MLE estimate of the transition probability is calculated as follows:

P(NNlDT)=𝐶 𝐷𝑇,𝑁𝑁

𝐶(𝐷𝑇)=

56059

116454=.49

7. HMM Part-of-speech tagging (Tag Transition Probability)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 30: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

(b) Word likelihood probabilities

Word and their tag combination.

(The word likelihood probabilities, P(wi|ti), represent the probability, given

that we see a given tag, that it will be associated with a given word)

For Example ( This book is/VBZ interesting)

In Treebank Brown corpus, the tag VBZ occurs 21,627 times, and VBZ is the

tag for “is” 10,073 times. Thus

P(islVBZ)=𝐶 𝑉𝐵𝑍,𝑖𝑠

𝐶(𝑉𝐵𝑍)=

10073

21627=.47

7. HMM Part-of-speech tagging (Word likelihood Probability)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 31: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Example of Tagging word “ to race” as VB as well NN.

Tag-to-tag combination Tag-to-tag combinationWord-to-tag combination

7. HMM Part-of-speech tagging (Example)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 32: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

The HMM is an extension of the finite automata a finite automaton is defined by a set of states, and a set of transitions between states that are taken based on the input observations.

A weighted finite-state automaton is a simple augmentation of the finite automaton in which each arc is associated with a probability, indicating how likely that path is to be taken. The probability on all the arcs leaving a node must sum to 1.

A Markov chain is a special case of a weighted automaton in which the input sequence uniquely determines which states the automaton will go through. Because they can’t represent inherently ambiguous problems, a Markov chain is only useful for assigning probabilities to unambiguous sequences. While the Markov chain is appropriate for situations where we can see the actual conditioning events, it is not appropriate in part-of-speech tagging. This is because in part-of-speech tagging, while we observe the words in the input, we do not observe the part-of-speech tags.

Thus we can’t condition any probabilities on, say, a previous part-of-speech tag, because we cannot be completely certain exactly which tag applied to the previous word.

A Hidden Markov Model (HMM) allows us to talk about both Model observed events (like words that we see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our probabilistic model. An HMM is specified by the following components:

8. Formalizing Hidden Markov Model Taggers

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 33: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

8. Formalizing Hidden Markov Model Taggers (Variable definations)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 34: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

8. Formalizing Hidden Markov Model Taggers (Apply Chain rule)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 35: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

• Example of “Ali is Intelligent”.

NNP VB

TO

End

Start

a01

a03

a02

a11

a13

a12

a31

a32

a33

a14

a34

Ali

8. Formalizing Hidden Markov Model Taggers (Example)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 36: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Apply single chain rule of HMM taggers over following NLP sentences of;

• Secretariat is expected to race tomorrow.

• is Secretariat expected to race tomorrow.

• expected Secretariat is to race tomorrow.

• to Secretariat is expected race tomorrow.

• race Secretariat is expected to tomorrow.

• tomorrow Secretariat is expected to race .

8. Formalizing Hidden Markov Model Taggers (Class Participation)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 37: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

For any model, such as HMM, that contains hidden variables,

- the task of determining which sequence of variables is the underlying source of some sequence of observations is called the decoding task.

The Viterbi algorithm is decoding perhaps the most common algorithm used for HMMs, whether for part-of-speech tagging or for speech recognition.

- looks a lot like the minimum edit distance algorithm.

The slightly simplified version of the Viterbi algorithm that we will present an input of a single HMM and includes;

- a set of observed words O = (o1o2o3 . . .oT ) and

- returns the most probable state/tag sequence Q = (q1q2q3 . . .qT), together with its probability.

Let the HMM be defined by the two Tables (next slide) expresses the aij probabilities,

- the transition probabilities between hidden states (i.e. part-of-speech tags).

9. The Viterbi Algorithm for HMM Tagging

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 38: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

Tag-to-tag combination Matrix

Word-to-tag combination Matrix

9. The Viterbi Algorithm for HMM Tagging (Cont…)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

Page 39: Part-of-Speech Tagging - AU Portals · 2020. 12. 23. · Part-of-speech tagging (or justtagging for short) is the processtagging of assigning apart-of-speech or other syntactic class

9. The Viterbi Algorithm for HMM Tagging (Cont…)

Figure expresses the Bi(ot ) probabilities, the observation likelihoods of words given tags.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)