csa2050: introduction to computational linguistics

Download CSA2050: Introduction to Computational Linguistics

Post on 19-Jan-2016

67 views

Category:

Documents

4 download

Embed Size (px)

DESCRIPTION

CSA2050: Introduction to Computational Linguistics. Part of Speech (POS) Tagging I Introduction Tagsets Approaches. Acknowledgment. Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 In turn based on Jurafsky & Martin Chapter 8. Bibliography. - PowerPoint PPT Presentation

TRANSCRIPT

  • CSA2050:Introduction to Computational Linguistics

    Part of Speech (POS) Tagging I Introduction Tagsets Approaches

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*AcknowledgmentMost slides taken from Bonnie Dorrs course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03 In turn based on Jurafsky & Martin Chapter 8

    CLINT Lecture IV

  • BibliographyR. Weischedel , R. Schwartz , J. Palmucci , M. Meteer , L. Ramshaw, Coping with Ambiguity and Unknown Words through Probabilistic Models, Computational Linguistics 19.2, pp 359--382,1993 [pdf]Samuelsson, C., Morphological tagging based entirely on Bayesian inference, in 9th Nordic Conference on Computational Linguistics, NODALIDA-93, Stockholm, 1993. (see [html])A. Ratnaparkhi, A maximum entropy model for part of speech tagging. Proceedings of the Conference on Empirical Methods in Natural Language, 1996 Processing [pdf]

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*OutlineThe tagging taskTagsetsThree different approaches

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Definition: PoS-TaggingPart-of-Speech Tagging is the process of assigning a part-of-speech or other lexical class marker to each word in a corpus (Jurafsky and Martin)

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*MotivationCorpus analysis of tagged corpora yields useful informationSpeech synthesis pronunciation CONtent (N) vs. conTENT (Adj)Speech recognition word class-based N-grams predict category of next word.Information retrievalstemmingselection of high-content wordsWord-sense disambiguation

    CLINT Lecture IV

  • English Parts of SpeechPronoun: any substitute for a noun or noun phraseAdjective: any qualifier of a nounVerb: any action or state of beingAdverb: any qualifier of an adjective verbPreposition: any establisher of relation and syntactic contextConjunction: any syntactic connectorInterjection: any emotional greeting (or "exclamation"),

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Tagsets: how detailed?

    Swedish SUC25Penn Treebank46German STTS50Lancaster BNC61Lancaster Full 146

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Penn Treebank TagsetPRPPRP$

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Example of Penn Treebank Tagging of Brown Corpus SentenceThe/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

    VB DT NN . Book that flight .

    VBZ DT NN VB NN ? Does that flight serve dinner ?

    CLINT Lecture IV

  • 2 ProblemsMultiple tags for the same wordUnknown words

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Multiple tags for the same wordHe can can a can.I can light a fire and you can open a can of beans. Now the can is open, and we can eat in the light of the fire.Flying planes can be dangerous.

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Multiple tags for the same wordWords often belong to more than one word class: thisThis is a nice day = PRP (pronoun)This day is nice = DT (determiner)You can go this far = RB (adverb)Many of the most common words (by volume of text) are ambiguous

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*How Hard is the Tagging Task?In the Brown Corpus11.5% of word types are ambiguous40% of word tokens are ambiguousMost words in English are unambiguous.Many of the most common words are ambiguous.Typically ambiguous tags are not equally probable.

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*Word Class Ambiguity(in the Brown Corpus)Unambiguous (1 tag): 35,340 typesAmbiguous (2-7 tags): 4,100 types

    .(Derose, 1988)

    2 tags3,7603 tags2644 tags615 tags126 tags27 tags1

    CLINT Lecture IV

  • April 2005CLINT Lecture IV*3 Approaches to TaggingRule-Based Tagger: ENCG Tagger (Voutilainen 1995,1999)

    Stochastic Tagger: HMM-based Tagger

    Transformation-Based Tagger: Brill Tagger (Brill 1995)

    CLINT Lecture IV

  • Unknown WordsAssume all unknown word is ambiguous amongst all possible tagsAdvantage: simplicityDisadvantage: ignores the fact that unknown words are unlikely to be closed classAssume that probability distribution of unknown words is same as words that have been seen just once.Make use of morphological information

    CLINT Lecture IV

  • Combining FeaturesThe last method makes use of different features, e.g. ending in -ed (suggest verb) or initial capital (suggests proper noun). Typically, a given tag is correlated with a combination of such features. These have to be incorporated into the statistical model.

    CLINT Lecture IV

  • Combining Tag-Predicting Features in Unknown WordsHMM ModelsWeischedel et. al. (1993): for each feature f and tag t (e.g. proper noun) build a probability estimator p(f|t). Assume independence and multiply probabilities togetherSamuelsson (1993), rather than preselecting features, considers all possible suffixes up to length 10 as features for predicting tags

    CLINT Lecture IV

  • Combining Tag-Predicting Features in Unknown WordsMaximum Entropy (ME) Models. A ME model is a classifier which assigns a class to an observation by computing a probability from an exponential function of a weighted set of features of the observationAn MEMM uses the Viterbi Algorithm to extend the application of ME to labelling a sequence of observations.For further details see Ratnaparkhi (1996)

    CLINT Lecture IV

  • SummaryExternal parameters to the tagging task are (i) the size of the chosen tagset and (ii) the coverage of the lexicon which gives possible tags to words.Two main problems: (i) disambiguation of tags and (ii) dealing with unknown wordsSeveral methods are available for dealing with (ii): HMMs and MEMMs

    CLINT Lecture IV

    ***POS-Tagging is useful for determining what words are likely to occur in the vicinity of other words. For example, if we know the difference between possessive pronouns (my, your, etc.) and personal pronouns (I, you, he, etc.), we will have some idea of what the next word will be. (Possessive pronoun likely to be followed by noun, personal pronoun likely to be followed by verb.)

    This is useful for speech processing:[click] Speech synthesis pronunciation; polish, Polish [one is NN, one is JJ]; books example CONtent [nn], conTENT [jj][click] Speech recognition class-based N-grams [we saw this with [n iy], which could be need (after PRP I) or knee (after PPS my)]Other NLP applications[click] Information retrieval stemming [knowing the POS can help tell us which morp affixes it takes]; high-content words [allows us to ignore functional words like the][click] Word-sense disambiguation can help build automatic word-sense disambiguating algorithms for text, e.g., flies [nn], flies [vbz]; also partial parsing of text.[click] Corpus analysis of language & lexicography find instances or frequencies of particular constructions in large corpora, can help us build lexicons, for example.****the wored with 7 tags was still acc. to J&M p. 299*

Recommended

View more >