עיבוד שפות טבעיות - שיעור חמישי pos tagging algorithms

23
88-680 1 Text Books Text Books - תתתתת תתתת תתתתתת תתתתת תתתתתPOS Tagging Algorithms תתתת תתת תתתתתת תתתתת תתתתת תתתתתתתתתת תת תתתת

Upload: dulcea

Post on 19-Mar-2016

56 views

Category:

Documents


5 download

DESCRIPTION

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms. עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן. Supervised Learning Scheme. “Labeled” Examples. Training Algorithm. Classification Model. New Examples. Classification Algorithm. Classifications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 1

Text Books 

Text Books 

עיבוד שפות טבעיות - שיעור חמישי

POS Tagging Algorithms

עידו דגןהמחלקה למדעי המחשב

אוניברסיטת בר אילן

Page 2: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 2

Text Books 

Text Books 

Supervised Learning Scheme

ClassificationModel

“Labeled”Examples

NewExamples Classifications

Training Algorithm

ClassificationAlgorithm

Page 3: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 3

Text Books 

Text Books 

Transformational Based Learning (TBL) for Tagging

• Introduced by Brill (1995)• Can exploit a wider range of lexical and syntactic

regularities via transformation rules – triggering environment and rewrite rule

• Tagger:– Construct initial tag sequence for input – most frequent tag

for each word– Iteratively refine tag sequence by applying “transformation

rules” in rank order• Learner:

– Construct initial tag sequence for the training corpus– Loop until done:

• Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking

Page 4: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 4

Text Books 

Text Books 

Some examples1. Change NN to VB if previous is TO

– to/TO conflict/NN with VB2. Change VBP to VB if MD in previous three

– might/MD vanish/VBP VB3. Change NN to VB if MD in previous two

– might/MD reply/NN VB4. Change VB to NN if DT in previous two

– the/DT reply/VB NN

Page 5: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 5

Text Books 

Text Books 

Transformation TemplatesSpecify which transformations are possibleFor example: change tag A to tag B when:

1. The preceding (following) tag is Z2. The tag two before (after) is Z3. One of the two previous (following) tags is Z4. One of the three previous (following) tags is Z5. The preceding tag is Z and the following is W6. The preceding (following) tag is Z and the tag two

before (after) is W

Page 6: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 6

Text Books 

Text Books 

LexicalizationNew templates to include dependency on

surrounding words (not just tags):Change tag A to tag B when:

1. The preceding (following) word is w2. The word two before (after) is w3. One of the two preceding (following) words is w4. The current word is w5. The current word is w and the preceding (following)

word is v6. The current word is w and the preceding (following)

tag is X (Notice: word-tag combination)7. etc…

Page 7: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 7

Text Books 

Text Books 

Initializing Unseen Words• How to choose most likely tag for unseen

words?Transformation based approach:

– Start with NP for capitalized words, NN for others

– Learn “morphological” transformations from:Change tag from X to Y if:

1. Deleting prefix (suffix) x results in a known word2. The first (last) characters of the word are x3. Adding x as a prefix (suffix) results in a known word4. Word W ever appears immediately before (after) the

word5. Character Z appears in the word

Page 8: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 8

Text Books 

Text Books 

UnannotatedInput Text

AnnotatedText

Ground Truth forInput Text

RulesLearning Algorithm

TBL Learning Scheme

Setting InitialState

Page 9: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 9

Text Books 

Text Books 

Greedy Learning Algorithm• Initial tagging of training corpus – most

frequent tag per word• At each iteration:

– Identify rules that fix errors and compute “error reduction” for each transformation rule:• #errors fixed - #errors introduced

– Find best rule; If error reduction greater than a threshold (to avoid overfitting):• Apply best rule to training corpus• Append best rule to ordered list of transformations

Page 10: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 10

Text Books 

Text Books 

Stochastic POS Tagging

• POS tagging:For a given sentence W = w1…wn Find the matching POS tags T = t1…tn

• In a statistical framework:T' = arg max P(T|W)

T

Page 11: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 11

Text Books 

Text Books 

n

iiiii

t

nn

n

iii

nn

n

iii

n

inii

n

inni

nnn

n

nnn

nnt

ttPtwP

ttPttPttPtPtwP

ttPtttPttPtPtwP

tPtwP

tPtwP

tPtwPwP

tPtwP

wtP

n

n

11

123121

1

1..1213121

1

1..1

1..1..1

..1..1..1

..1

..1..1..1

..1..1

)|()|(maxarg

)|()...|()|()()|(maxarg

)|()...|()|()()|(maxarg

)()|(maxarg

)()|(maxarg

)()|(maxarg)(

)()|(maxarg

)|(maxarg

..1

..1 Bayes’ Rule

Words are independent of each other

A word’s identity depends only on its own tag

Markovian assumptions

Denominator doesn’t depend on tags

Chaining rule

Notation: P(t1) = P(t1 | t0)

Page 12: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 12

Text Books 

Text Books 

The Markovian assumptions

• Limited Horizon– P(Xi+1 = tk |X1,…,Xi) = P(Xi+1 = tk | Xi)

• Time invariant – P(Xi+1 = tk | Xi) = P(Xj+1 = tk | Xj)

Page 13: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 13

Text Books 

Text Books 

Maximum Likelihood Estimations

• In order to estimate P(wi|ti), P(ti|ti-1)we can use the maximum likelihood estimation– P(wi|ti) = c(wi,ti) / c(ti)

– P(ti|ti-1) = c(ti-1ti) / c(ti-1)• Notice estimation for i=1

Page 14: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 14

Text Books 

Text Books 

Unknown Words

• Many words will not appear in the training corpus.

• Unknown words are a major problem for taggers (!)

• Solutions – – Incorporate Morphological Analysis Consider words appearing once in

training data as UNKOWNs

Page 15: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 15

Text Books 

Text Books 

“Add-1/Add-Constant” Smoothing

1usually :events language naturalIn prior. uniform assuming Laplace, : 1

||)()(

:tionredistribu and gdiscountin - Smoothing

s)(sparsenes eventsy probabilit lowmany for 0)( length) corpus (e.g. allfor count total the-

)occurrence word(e.g. event for count the- )(

)()(

XN

xcxp

xpXxN

xxcNxcxp

S

MLE

MLE

Page 16: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 16

Text Books 

Text Books 

Smoothing for Tagging

• For P(ti|ti-1)

• Optionally – for P(ti|ti-1)

Page 17: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 17

Text Books 

Text Books 

Viterbi

• Finding the most probable tag sequence can be done with the viterbi algorithm.

• No need to calculate every single possible tag sequence (!)

Page 18: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 18

Text Books 

Text Books 

Hmms

• Assume a state machine with– Nodes that correspond to tags– A start and end state– Arcs corresponding to transition

probabilities - P(ti|ti-1) – A set of observations likelihoods for each

state - P(wi|ti)

Page 19: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 19

Text Books 

Text Books 

NN

VBZ

NNS AT

VB

RB

P(like)=0.2P(fly)=0.3

…P(eat)=0.36

0.6

0.4

P(likes)=0.3P(flies)=0.1

…P(eats)=0.5

P(the)=0.4P(a)=0.3P(an)=0.2

Page 20: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 20

Text Books 

Text Books 

HMMs

• An HMM is similar to an Automata augmented with probabilities

• Note that the states in an HMM do not correspond to the input symbols.

• The input symbols don’t uniquely determine the next state.

Page 21: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 21

Text Books 

Text Books 

HMM definition• HMM=(S,K,A,B)

– Set of states S={s1,…sn}

– Output alphabet K={k1,…kn}

– State transition probabilities A={aij} i,jS– Symbol emission probabilities B=b(i,k) iS,kK– start and end states (Non emitting)

• Alternatively: initial state probabilities

• Note: for a given i- aij=1 & b(i,k)=1

Page 22: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 22

Text Books 

Text Books 

Why Hidden?• Because we only observe the input -

the underlying states are hidden • Decoding:

The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w1,…,wn find a state sequence T=t1,…,tn that best explains the observation.

Page 23: עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

88-680 23

Text Books 

Text Books 

Homework