part of speech tagging

Part of Speech Tagging

Importance

Resolving ambiguities by assigning lower probabilities to words that don’t fit

Applying to language grammatical rules to parse meanings of sentences and phrases

Part of Speech Tagging

Approaches to POS Tagging

Determine a word’s lexical class based on context

Approaches to POS Tagging

• Initialize and maintain tagging criteria– Supervised: uses pre-tagged corpora– Unsupervised: Automatically induce classes by

probability and learning algorithms– Partially supervised: combines the above approaches

• Algorithms– Rule based: Use pre-defined grammatical rules– Stochastic: use HMM and other probabilistic algorithms– Neural: Use neural nets to learn the probabilities

Example

The man ate the fish on the boat in the

morning

Word TagThe DeterminerMan NounAte VerbThe DeterminerFish NounOn PrepositionThe DeterminerBoat Noun

In PrepositionThe Determiner

Morning Noun

Word Class Categories

Note: Personal pronoun often PRP, Possessive Pronoun often PRP$

Word Classes

– Open (Classes that frequently spawn new words) • Common Nouns, Verbs, Adjectives, Adverbs.

– Closed (Classes that don’t often spawn new words): • prepositions: on, under, over, …• particles: up, down, on, off, …• determiners: a, an, the, …• pronouns: she, he, I, who, ...• conjunctions: and, but, or, …• auxiliary verbs: can, may should, …• numerals: one, two, three, third, …

Particle: An uninflected item with a grammatical function but withoutclearly belonging to a major part of speech. Example: He looked up the word.

The Linguistics Problem

• Words often are in multiple classes.

• Example: this– This is a nice day

= preposition– This day is nice

= determiner– You can go this far

= adverb• Accuracy

– 96 – 97% is a baseline for new algorithms

– 100% impossible even for human annotators

Unambiguous: 35,340

2 tags 3,760

3 tags 264

4 tags 61

5 tags 12

6 tags 2

7 tags 1

(Derose, 1988)

Rule-Based Tagging• Basic Idea:

– Assign all possible tags to words– Remove tags according to a set of rules

o Example rule:

IF word+1 is adjective, adverb, or quantifier ending a sentenceIF word-1 is not a verb like “consider” THEN

eliminate non-adverb ELSE eliminate adverb

– There are more than 1000 hand-written rules

Stage 1: Rule-based tagging

• First Stage:

FOR each word Get all possible parts of speech using a morphological analysis algorithm

• ExampleNNRB

VBN JJ VBPRP VBD TO VB DT NNShe promised to back the bill

Stage 2: Rule-based Tagging

• Apply rules to remove possibilities• Example Rule:

IF VBD is an option and VBN|VBD follows “<start>PRP”THEN Eliminate VBN

NNRB

VBN JJ VBPRP VBD TO VB DT NNShe promised to back the bill

Stochastic Tagging• Use probability of certain tag occurring given various

possibilities

• Requires a training corpus

• Problems to overcome– Algorithm to assign type for words that are not in corpus– Naive Method

• Choose most frequent tag in training text for each word!• Result: 90% accuracy

HMM Stochastic Tagging• Intuition: Pick the most likely tag based on context• Maximize the formula using a HMM

– P(word|tag) × P(tag|previous n tags)

• Observe: W = w1, w2, …, wn

• Hidden: T = t1,t2,…,tn

• Goal: Find the part of speech that most likely generate a sequence of words

Transformation-Based Tagging (TBL)

• Combine Rule-based and stochastic tagging approaches– Uses rules to guess at tags– machine learning using a tagged corpus as input

• Basic Idea: Later rules correct errors made by earlier rules– Set the most probable tag for each word as a start value– Change tags according to rules of type:

IF word-1 is a determiner and word is a verb THEN change the tag to noun

• Training uses a tagged corpus– Step 1: Write a set of rule templates– Step 2: Order the rules based on corpus accuracy

(Brill Tagging)

TBL: The Algorithm

• Step 1: Use dictionary to label every word with the most likely tag

• Step 2: Select the transformation rule which most improves tagging

• Step 3: Re-tag corpus applying the rules• Repeat 2-3 until accuracy reaches threshold• RESULT: Sequence of transformation rules

TBL: Problems

• Problems– Infinite loops and rules may interact– The training algorithm and execution speed is slower than HMM

• Advantages– It is possible to constrain the set of transformations with “templates”

IF tag Z or word W is in position *-kTHEN replace tag X with tag

– Learns a small number of simple, non-stochastic rules– Speed optimizations are possible using finite state transducers– TBL is the best performing algorithm on unknown words– The Rules are compact and can be inspected by humans

• Accuracy– First 100 rules achieve 96.8% accuracy

First 200 rules achieve 97.0% accuracy

Neural NetworkDigital approximation of biological neurons

Digital Neuron

Σ f(n)W

W

W

W

Outputs

Activation

Function

INPUTS

W=Weight

Neuron

Transfer Functions

: ( ) 11 nSIGMOID f n

e

: ( )LINEAR f n n

1

0 Input

Output

Networks without feedback

Multiple Inputs and Single Layer

Multiple Inputs and layers

Feedback (Recurrent Networks)Feedback

Supervised LearningInputs from the environment

Neural Network

Actual System

Σ

Error

+

-

Expected Output

Actual Output

Training

Run a set of training data through the network and compare the outputs to expected results. Back propagate the errors to update the neural weights, until the outputs match what is expected

Multilayer PerceptronDefinition: A network of neurons in which the output(s) of some neurons are connected through weighted connections to the input(s) of other neurons.

Inputs First Hidden layer

Second Hidden Layer

Output Layer

Backpropagation of Errors

Function Signals

Error Signals

part of speech tagging

Documents

rulebased taggingapply

rulebased taggingfirst

rulebased taggingbasic

word classesopen classes

possibilitiesexample

sequence of words

sentenceif word

words lexical class