february 2007csa3050: tagging ii1 csa2050: natural language processing tagging 2 rule-based tagging...

Download February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams

Post on 17-Dec-2015

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams
  • Slide 2
  • February 2007CSA3050: Tagging II2 Tagging 2 Lecture Slides based on Mike Rosner and Marti Hearst notes Additions from NLTK tutorials
  • Slide 3
  • February 2007CSA3050: Tagging II3 Rule-Based Tagger Basic Idea: Assign all possible tags to words Remove tags according to set of rules if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like consider then eliminate non-adv else eliminate adv. Typically more than 1000 hand-written rules, but may be machine-learned.
  • Slide 4
  • February 2007CSA3050: Tagging II4 ENGTWOL Based on two-level morphology 56,000 entries for English word stems Each entry annotated with morphological and syntactic features
  • Slide 5
  • February 2007CSA3050: Tagging II5 Sample ENGTWOL Lexicon
  • Slide 6