csa2050: introduction to computational linguistics

21
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches

Upload: mickey

Post on 19-Jan-2016

81 views

Category:

Documents


4 download

DESCRIPTION

CSA2050: Introduction to Computational Linguistics. Part of Speech (POS) Tagging I Introduction Tagsets Approaches. Acknowledgment. Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 In turn based on Jurafsky & Martin Chapter 8. Bibliography. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSA2050: Introduction to Computational Linguistics

CSA2050:Introduction to Computational

Linguistics

Part of Speech (POS) Tagging I

Introduction Tagsets Approaches

Page 2: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 2

Acknowledgment

Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03

In turn based on Jurafsky & Martin Chapter 8

Page 3: CSA2050: Introduction to Computational Linguistics

Bibliography

R. Weischedel , R. Schwartz , J. Palmucci , M. Meteer , L. Ramshaw, Coping with Ambiguity and Unknown Words through Probabilistic Models, Computational Linguistics 19.2, pp 359--382,1993 [pdf]

Samuelsson, C., Morphological tagging based entirely on Bayesian inference, in 9th Nordic Conference on Computational Linguistics, NODALIDA-93, Stockholm, 1993. (see [html])

A. Ratnaparkhi, A maximum entropy model for part of speech tagging. Proceedings of the Conference on Empirical Methods in Natural Language, 1996 Processing [pdf]

Page 4: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 4

Outline

The tagging task Tagsets Three different approaches

Page 5: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 5

Definition: PoS-Tagging

“Part-of-Speech Tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin)

thegirlkissedtheboyonthecheek

WORDSTAGS

NVPDET

Page 6: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 6

Motivation

Corpus analysis of tagged corpora yields useful information

Speech synthesis — pronunciation CONtent (N) vs. conTENT (Adj)

Speech recognition — word class-based N-grams predict category of next word.

Information retrieval stemming selection of high-content words

Word-sense disambiguation

Page 7: CSA2050: Introduction to Computational Linguistics

English Parts of Speech

1. Pronoun: any substitute for a noun or noun phrase

2. Adjective: any qualifier of a noun3. Verb: any action or state of being4. Adverb: any qualifier of an adjective

verb5. Preposition: any establisher of relation

and syntactic context6. Conjunction: any syntactic connector7. Interjection: any emotional greeting (or

"exclamation"),

Page 8: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 8

Tagsets: how detailed?

Swedish SUC 25

Penn Treebank 46

German STTS 50

Lancaster BNC 61

Lancaster Full 146

Page 9: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 9

Penn Treebank Tagset

PRPPRP$

Page 10: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 10

Example of Penn Treebank Tagging of Brown Corpus SentenceThe/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

VB DT NN .Book that flight .

VBZ DT NN VB NN ?Does that flight serve dinner ?

Page 11: CSA2050: Introduction to Computational Linguistics

2 Problems

Multiple tags for the same word Unknown words

Page 12: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 12

Multiple tags for the same word

1. He can can a can.

2. I can light a fire and you can open a can of beans. Now the can is open, and we can eat in the light of the fire.

3. Flying planes can be dangerous.

Page 13: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 13

Multiple tags for the same word

Words often belong to more than one word class: this This is a nice day = PRP (pronoun) This day is nice = DT (determiner) You can go this far = RB (adverb)

Many of the most common words (by volume of text) are ambiguous

Page 14: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 14

How Hard is the Tagging Task?

In the Brown Corpus 11.5% of word types are ambiguous 40% of word tokens are ambiguous

Most words in English are unambiguous. Many of the most common words are

ambiguous. Typically ambiguous tags are not equally

probable.

Page 15: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 15

Word Class Ambiguity(in the Brown Corpus)

Unambiguous (1 tag): 35,340 types

Ambiguous (2-7 tags): 4,100 types

.

2 tags 3,760

3 tags 264

4 tags 61

5 tags 12

6 tags 2

7 tags 1(Derose, 1988)

Page 16: CSA2050: Introduction to Computational Linguistics

April 2005 CLINT Lecture IV 16

3 Approaches to Tagging

1. Rule-Based Tagger: ENCG Tagger(Voutilainen 1995,1999)

2. Stochastic Tagger: HMM-based Tagger

3. Transformation-Based Tagger: Brill Tagger(Brill 1995)

Page 17: CSA2050: Introduction to Computational Linguistics

Unknown Words

1. Assume all unknown word is ambiguous amongst all possible tagsAdvantage: simplicity

Disadvantage: ignores the fact that unknown words are unlikely to be closed class

2. Assume that probability distribution of unknown words is same as words that have been seen just once.

3. Make use of morphological information

Page 18: CSA2050: Introduction to Computational Linguistics

Combining Features

The last method makes use of different features, e.g. ending in -ed (suggest verb) or initial capital (suggests proper noun).

Typically, a given tag is correlated with a combination of such features. These have to be incorporated into the statistical model.

Page 19: CSA2050: Introduction to Computational Linguistics

Combining Tag-Predicting Features in Unknown Words

HMM Models Weischedel et. al. (1993): for each feature f and

tag t (e.g. proper noun) build a probability estimator p(f|t). Assume independence and multiply probabilities together

Samuelsson (1993), rather than preselecting features, considers all possible suffixes up to length 10 as features for predicting tags

Page 20: CSA2050: Introduction to Computational Linguistics

Combining Tag-Predicting Features in Unknown Words

Maximum Entropy (ME) Models. A ME model is a classifier which assigns a class to

an observation by computing a probability from an exponential function of a weighted set of features of the observation

An MEMM uses the Viterbi Algorithm to extend the application of ME to labelling a sequence of observations.

For further details see Ratnaparkhi (1996)

Page 21: CSA2050: Introduction to Computational Linguistics

Summary

External parameters to the tagging task are (i) the size of the chosen tagset and (ii) the coverage of the lexicon which gives possible tags to words.

Two main problems: (i) disambiguation of tags and (ii) dealing with unknown words

Several methods are available for dealing with (ii): HMMs and MEMMs