parts of speech - carnegie mellon universitydemo.clab.cs.cmu.edu/11711fa19/slides/fa19...

54

Upload: others

Post on 25-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • ▪▪▪

    ▪▪▪

    ▪▪

    ▪▪

  • Parts of Speech

  • More Fine-Grained Classes

  • More Fine-Grained Classes

    Actually, I ran home extremely quickly yesterday

  • The closed classes

  • Example of POS tagging

  • The Penn Treebank Part-of-Speech Tagset

  • The Universal POS tagset

    https://universaldependencies.org

    https://universaldependencies.org/

  • POS tagging

    goal: resolve POS ambiguities

  • POS tagging

  • Most Frequent Class Baseline

    The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

    of 92.34%.

  • Most Frequent Class Baseline

    The WSJ training corpus and test on sections 22-24 of the same corpus the most-frequent-tag baseline achieves an accuracy

    of 92.34%.

    ● 97% tag accuracy achievable by most algorithms(HMMs, MEMMs, neural networks, rule-based algorithms)

  • Why POS tagging

    ▪ Text-to-speech ▪ record, lead, protest

    ▪ Lemmatization ▪ saw/V → see, saw/N → saw

    ▪ Preprocessing for harder disambiguation problems▪ syntactic parsing▪ semantic parsing

  • Generative sequence labeling: Hidden Markov Models

  • o1 o2 on

    ▪ In real world many events are not observable▪ Speech recognition: we observe

    acoustic features but not the phones▪ POS tagging: we observe words but

    not the POS tags

    Hidden Markov Models

    q1

    q2

    qn ...

  • HMM

    From J&M

  • HMM example

    From J&M

  • HMMs:Algorithms

    From J&M

    Forward

    Viterbi

    Forward–Backward; Baum–Welch

  • HMM tagging as decoding

  • HMM tagging as decoding

    How many possible choices?

  • Part of speech tagging example

    Slide credit: Noah Smith

  • The Viterbi Algorithm

  • The Viterbi Algorithm

  • The Viterbi Algorithm

  • The Viterbi Algorithm

  • Beam search

  • HMMs:Algorithms

    From J&M

    Forward

    Viterbi

    Forward–Backward; Baum–Welch

  • The Forward Algorithm

    sum instead of max

  • Viterbi

    ▪ n-best decoding▪ relationship to sequence alignment▪

  • Extending the HMM Algorithm to Trigrams

  • ▪ Word shape▪ lower case → x▪ upper case → X▪ numbers → d▪ punctuation → .▪ I.M.F → X.X.X▪ DC10-30 → XXdd-dd

    ▪ Word shape + consecutive character types are removed▪ DC10-30 → Xd-d

    ▪ Prefixes & suffixes ▪ -s, -ed, ing▪

    Unknown Words

  • Brants (2000)

    ▪ a trigram HMM▪ handling unknown words▪ 96.7% on the Penn Treebank

  • Generative vs. Discriminative models

    ▪ Generative models specify a joint distribution over the labels and the data. With this you could generate new data

    ▪ Discriminative models specify the conditional distribution of the label y given the data x. These models focus on how to discriminate between the classes

    From Bamman

  • Maximum Entropy Markov Models (MEMM)

    ▪ HMM

    ▪ MEMM

  • Features in a MEMM

  • Features in a MEMM

    ▪ well-dressed

  • Decoding and Training MEMMs

  • Decoding MEMMs

    greedy approach:

    doesn’t use evidence from future decisions

  • Decoding MEMMs

    Viterbi

    ▪ filling the chart with▪ HMM

    ▪ MEMM

  • Bidirectionality

    ▪ Label bias or observation bias problem ▪ will/NN to/TO fight/VB

    ▪ Linear-chain CRF (Lafferty et al. 2001)▪ A bidirectional version of the MEMM (Toutanova et al. 2003)▪ bi-LSTM

  • Neural sequence tagger

    ▪ Lample et al. 2016▪ Neural Architectures for NER

  • Multilingual POS tagging

    ▪ In morphologically-rich languages like Czech, Hungarian, Turkish▪ a 250,000 word token corpus of Hungarian has more than twice as

    many word types as a similarly sized corpus of English▪ a 10 million word token corpus of Turkish contains four times as

    many word types as a similarly sized English corpus

    ▪ ⇒ many UNKs▪ more information is coded in morphology

  • Multilingual POS tagging

    ▪ In non-word-space languages like Chinese word segmentation is either applied before tagging or done jointly▪ UNKs are difficult: the majority of unknown words are common

    nouns and verbs because of extensive compounding

    ▪ Universal POS tagset accounts for cross-linguistic differences

  • Named Entity Recognition

  • Named Entity tags

  • Ambiguity in NER

  • NER as Sequence Labeling

    IOB tagging scheme

  • A feature-based algorithm for NER

  • A feature-based algorithm for NER

    ▪ gazetteers▪ a list of place names providing millions of entries for locations with

    detailed geographical and political information▪ binary indicator features

  • Evaluation of NER

    ▪ F-score▪ segmentation is a confound

    ▪ e.g., American/B-ORG Airlines▪ 2 errors: false positive for O and a false negative for I-ORG

  • HMMs in Automatic Speech Recognition

    ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb

    “speech lab”

  • HMMs in Automatic Speech Recognition

    w1

    w2

    Words

    s1

    s2

    s3

    s4

    s5

    s6

    s7

    Sound types

    a1

    a2

    a3

    a4

    a5

    a6

    a7

    Acousticobservations

    Languagemodel

    Acousticmodel