part of speech tagging

23
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules to parse meanings of sentences and phrases

Upload: sakura

Post on 24-Feb-2016

71 views

Category:

Documents


1 download

DESCRIPTION

Part of Speech Tagging. Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules to parse meanings of sentences and phrases. Part of Speech Tagging. Determine a word’s lexical class based on context. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part of Speech Tagging

Part of Speech Tagging

Importance

Resolving ambiguities by assigning lower probabilities to words that don’t fit

Applying to language grammatical rules to parse meanings of sentences and phrases

Page 2: Part of Speech Tagging

Part of Speech Tagging

Approaches to POS Tagging

Determine a word’s lexical class based on context

Page 3: Part of Speech Tagging

Approaches to POS Tagging

• Initialize and maintain tagging criteria– Supervised: uses pre-tagged corpora– Unsupervised: Automatically induce classes by

probability and learning algorithms– Partially supervised: combines the above approaches

• Algorithms– Rule based: Use pre-defined grammatical rules– Stochastic: use HMM and other probabilistic algorithms– Neural: Use neural nets to learn the probabilities

Page 4: Part of Speech Tagging

Example

The man ate the fish on the boat in the

morning

Word TagThe DeterminerMan NounAte VerbThe DeterminerFish NounOn PrepositionThe DeterminerBoat Noun

In PrepositionThe Determiner

Morning Noun

Page 5: Part of Speech Tagging

Word Class Categories

Note: Personal pronoun often PRP, Possessive Pronoun often PRP$

Page 6: Part of Speech Tagging

Word Classes

– Open (Classes that frequently spawn new words) • Common Nouns, Verbs, Adjectives, Adverbs.

– Closed (Classes that don’t often spawn new words): • prepositions: on, under, over, …• particles: up, down, on, off, …• determiners: a, an, the, …• pronouns: she, he, I, who, ...• conjunctions: and, but, or, …• auxiliary verbs: can, may should, …• numerals: one, two, three, third, …

Particle: An uninflected item with a grammatical function but withoutclearly belonging to a major part of speech. Example: He looked up the word.

Page 7: Part of Speech Tagging

The Linguistics Problem

• Words often are in multiple classes.

• Example: this– This is a nice day

= preposition– This day is nice

= determiner– You can go this far

= adverb• Accuracy

– 96 – 97% is a baseline for new algorithms

– 100% impossible even for human annotators

Unambiguous: 35,340

2 tags 3,760

3 tags 264

4 tags 61

5 tags 12

6 tags 2

7 tags 1

(Derose, 1988)

Page 8: Part of Speech Tagging

Rule-Based Tagging• Basic Idea:

– Assign all possible tags to words– Remove tags according to a set of rules

o Example rule:

IF word+1 is adjective, adverb, or quantifier ending a sentenceIF word-1 is not a verb like “consider” THEN

eliminate non-adverb ELSE eliminate adverb

– There are more than 1000 hand-written rules

Page 9: Part of Speech Tagging

Stage 1: Rule-based tagging

• First Stage:

FOR each word Get all possible parts of speech using a morphological analysis algorithm

• ExampleNNRB

VBN JJ VBPRP VBD TO VB DT NNShe promised to back the bill

Page 10: Part of Speech Tagging

Stage 2: Rule-based Tagging

• Apply rules to remove possibilities• Example Rule:

IF VBD is an option and VBN|VBD follows “<start>PRP”THEN Eliminate VBN

NNRB

VBN JJ VBPRP VBD TO VB DT NNShe promised to back the bill

Page 11: Part of Speech Tagging

Stochastic Tagging• Use probability of certain tag occurring given various

possibilities

• Requires a training corpus

• Problems to overcome– Algorithm to assign type for words that are not in corpus– Naive Method

• Choose most frequent tag in training text for each word!• Result: 90% accuracy

Page 12: Part of Speech Tagging

HMM Stochastic Tagging• Intuition: Pick the most likely tag based on context• Maximize the formula using a HMM

– P(word|tag) × P(tag|previous n tags)

• Observe: W = w1, w2, …, wn

• Hidden: T = t1,t2,…,tn

• Goal: Find the part of speech that most likely generate a sequence of words

Page 13: Part of Speech Tagging

Transformation-Based Tagging (TBL)

• Combine Rule-based and stochastic tagging approaches– Uses rules to guess at tags– machine learning using a tagged corpus as input

• Basic Idea: Later rules correct errors made by earlier rules– Set the most probable tag for each word as a start value– Change tags according to rules of type:

IF word-1 is a determiner and word is a verb THEN change the tag to noun

• Training uses a tagged corpus– Step 1: Write a set of rule templates– Step 2: Order the rules based on corpus accuracy

(Brill Tagging)

Page 14: Part of Speech Tagging

TBL: The Algorithm

• Step 1: Use dictionary to label every word with the most likely tag

• Step 2: Select the transformation rule which most improves tagging

• Step 3: Re-tag corpus applying the rules• Repeat 2-3 until accuracy reaches threshold• RESULT: Sequence of transformation rules

Page 15: Part of Speech Tagging

TBL: Problems

• Problems– Infinite loops and rules may interact– The training algorithm and execution speed is slower than HMM

• Advantages– It is possible to constrain the set of transformations with “templates”

IF tag Z or word W is in position *-kTHEN replace tag X with tag

– Learns a small number of simple, non-stochastic rules– Speed optimizations are possible using finite state transducers– TBL is the best performing algorithm on unknown words– The Rules are compact and can be inspected by humans

• Accuracy– First 100 rules achieve 96.8% accuracy

First 200 rules achieve 97.0% accuracy

Page 16: Part of Speech Tagging

Neural NetworkDigital approximation of biological neurons

Page 17: Part of Speech Tagging

Digital Neuron

Σ f(n)W

W

W

W

Outputs

Activation

Function

INPUTS

W=Weight

Neuron

Page 18: Part of Speech Tagging

Transfer Functions

: ( ) 11 nSIGMOID f n

e

: ( )LINEAR f n n

1

0 Input

Output

Page 19: Part of Speech Tagging

Networks without feedback

Multiple Inputs and Single Layer

Multiple Inputs and layers

Page 20: Part of Speech Tagging

Feedback (Recurrent Networks)Feedback

Page 21: Part of Speech Tagging

Supervised LearningInputs from the environment

Neural Network

Actual System

Σ

Error

+

-

Expected Output

Actual Output

Training

Run a set of training data through the network and compare the outputs to expected results. Back propagate the errors to update the neural weights, until the outputs match what is expected

Page 22: Part of Speech Tagging

Multilayer PerceptronDefinition: A network of neurons in which the output(s) of some neurons are connected through weighted connections to the input(s) of other neurons.

Inputs First Hidden layer

Second Hidden Layer

Output Layer

Page 23: Part of Speech Tagging

Backpropagation of Errors

Function Signals

Error Signals