![Page 1: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/1.jpg)
CS388: Natural Language Processing Lecture 4: Sequence Models
Eunsol Choi
Parts of this lecture adapted from Greg DurreA, Yejin Choi, Yoav Artzi
![Page 2: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/2.jpg)
LogisHcs
2
‣ HW1 due today midnight
‣ HW2 will be released tomorrow, due September 30th
‣ Materials needed to do HW2 will be covered by next Tuesday
![Page 3: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/3.jpg)
Sequence Models
3
‣ Topics for next three lectures and HW2
‣ We will be back to neural sequence models again in a few weeks
![Page 4: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/4.jpg)
Overview
‣ Sequence Modeling Problems in NLP
‣ GeneraHve Model: Hidden Markov Models (HMM)
‣ DiscriminaHve Model: Maximum Entropy Markov Models (MEMM) CondiHonal Random Fields
‣ Unsupervised Learning: ExpectaHon MaximizaHon
![Page 5: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/5.jpg)
Reading
5
‣ Collins: HMMs —> GeneraHve sequence tagging model
‣ Collins: MEMMs —> DiscriminaHve sequence tagging models
‣ Collins: EMs —> ExpectaHon MaximizaHon
‣ J&M: Chapter 8 (opHonal) ‣ Covers both HMM, MEMM
![Page 6: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/6.jpg)
The Structure of Language
‣ Language is tree-structured
I ate the spaghea with chopsHcks I ate the spaghea with meatballs
‣ But labelled sequence can provide shallow analysis
I ate the spaghea with chopsHcks I ate the spaghea with meatballsPRP VBZ DT NN IN NNS PRP VBZ DT NN IN NNS
![Page 7: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/7.jpg)
Sequence Modeling Problems in NLP
7
‣ Parts of Speech Tagging (POS)
I ate the spaghea with chopsHcks I ate the spaghea with meatballsPRP VBZ DT NN IN NNS PRP VBZ DT NN IN NNS
‣ Named EnHty RecogniHon (NER): ‣ Segment text into spans with certain properHes (person,
organizaHon, )[Germany]LOC ’s representative to the [European Union]ORG ’s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should…
Germany/BL ’s/NA representative/NA to/NA the/NA European/BO Union/CO ’s/NA veterinary/NA committee/NA Werner/BP Zwingman/CP said/NA on/NA Wednesday/NA consumers/NA should/NA…
![Page 8: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/8.jpg)
Parts of Speech
Slide credit: Dan Klein
‣ CategorizaHon of words into types
![Page 9: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/9.jpg)
CC conjunction, coordinating and both but either orCD numeral, cardinal mid-1890 nine-thirty 0.5 oneDT determiner a all an every no that theEX existential there there FW foreign word gemeinschaft hund ich jeuxIN preposition or conjunction, subordinating among whether out on by ifJJ adjective or numeral, ordinal third ill-mannered regrettable
JJR adjective, comparative braver cheaper tallerJJS adjective, superlative bravest cheapest tallestMD modal auxiliary can may might will would NN noun, common, singular or mass cabbage thermostat investment subhumanity
NNP noun, proper, singular Motown Cougar Yvette LiverpoolNNPS noun, proper, plural Americans Materials StatesNNS noun, common, plural undergraduates bric-a-brac averagesPOS genitive marker ' 's PRP pronoun, personal hers himself it we themPRP$ pronoun, possessive her his mine my our ours their thy your RB adverb occasionally maddeningly adventurously
RBR adverb, comparative further gloomier heavier less-perfectlyRBS adverb, superlative best biggest nearest worst RP particle aboard away back by on open throughTO "to" as preposition or infinitive marker to UH interjection huh howdy uh whammo shucks heckVB verb, base form ask bring fire see take
VBD verb, past tense pleaded swiped registered sawVBG verb, present participle or gerund stirring focusing approaching erasingVBN verb, past participle dilapidated imitated reunifed unsettledVBP verb, present tense, not 3rd person singular twist appear comprise mold postponeVBZ verb, present tense, 3rd person singular bases reconstructs marks usesWDT WH-determiner that what whatever which whichever WP WH-pronoun that what whatever which who whomWP$ WH-pronoun, possessive whose WRB Wh-adverb however whenever where why
Main Tags
![Page 10: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/10.jpg)
POS Tagging
The back door = JJ (Adjective) On my back = NN (Noun) Win the voters back = RB (Adverb) Promised to back the bill = VB (Verb)
10
‣ The POS tagging problem is to determine the POS tag for a parHcular instance of a word.
‣ Many words have more than one POS, depending on its context
![Page 11: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/11.jpg)
Sources of InformaHon
11
‣ Knowledge of neighboring words
‣ Knowledge of word probabiliHes‣ the, a, an is almost always arHcle‣ man is frequently noun, rarely used as a verb
Time flies like an arrow; Fruit flies like a banana
‣ If we choose the most frequent tag, over 90% accuracy‣ About 40% of word tokens are ambiguous
![Page 12: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/12.jpg)
What is this good for?
‣ Preprocessing step for syntacHc parsers
‣ Domain-independent disambiguaHon for other tasks
‣ (Very) shallow informaHon extracHon: ‣ write regular expressions like (Det) Adj*N + over the output for
phrases
![Page 13: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/13.jpg)
POS tag sets in different languages
13[Petrov et al. 2012]
![Page 14: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/14.jpg)
Universal POS Tag Set
‣ Universal POS tagset (~12 tags), cross-lingual model works well!
Gillick et al. 2016
![Page 15: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/15.jpg)
Today
‣ Sequence Modeling Problems in NLP
‣ Hidden Markov Models (HMM)
‣ Inference (Viterbi)
‣ HMM parameter esHmaHon
![Page 16: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/16.jpg)
Classic SoluHon: Hidden Markov Models
y = (y1, ..., yn)Output ‣ Input x = (x1, ..., xn)
![Page 17: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/17.jpg)
Two simplifying assumpHons
17
‣ Independent AssumpHon:
‣ Markov AssumpHon (future is condiHonally independent of the past given the present)
P(yi |y1, y2, ⋯, yi−1) = P(yi |yi−1)
P(xi |x, y) = P(xi |yi)
![Page 18: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/18.jpg)
HMM for POS
18
The Georgia branch had taken on loan commitments …
DT NNP NN VBD VBN RP NN NNS
‣ States = {DT, NNP, NN, ... } are the POS tags
‣ Observations = V are words
‣ Transition distribution models the tag sequences
‣ Emission distribution models words given their POS
𝑌𝑋
𝑞(𝑦𝑖 𝑦𝑖−1)
𝑒(𝑥𝑖 𝑦𝑖)
![Page 19: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/19.jpg)
HMM Learning and Inference
19
‣ Learning: ‣ Maximum likelihood: transiHon q and emission e
‣ Inference: ‣ Viterbi:
![Page 20: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/20.jpg)
Learning: Maximum Likelihood
20
‣ Supervised Learning for esHmaHng transiHons and emissions
‣ Any concerns for the quality of any of these esHmates?
Sparsity again!
![Page 21: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/21.jpg)
Learning: Low frequency Words
21
Dealingwith Low-FrequencyWords: An Example[Bikel et. al 1999] (named-entity recognition)
Word class Example Intuition
twoDigitNum 90 Two digit yearfourDigitNum 1990 Four digit yearcontainsDigitAndAlpha A8956-67 Product codecontainsDigitAndDash 09-96 DatecontainsDigitAndSlash 11/9/89 DatecontainsDigitAndComma 23,000.00 Monetary amountcontainsDigitAndPeriod 1.00 Monetary amount,percentageothernum 456789 Other numberallCaps BBN OrganizationcapPeriod M. Person name initialfirstWord first word of sentence no useful capitalization informationinitCap Sally Capitalized wordlowercase can Uncapitalized wordother , Punctuation marks, all other words
18
‣ Used the following word classes for infrequent words [Bickel et. al, 1999]
![Page 22: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/22.jpg)
Inference (Decoding)
‣ Inference problem:
‣ We can list all possible y and then pick the best one! ‣ Any problems?
‣ Input x = (x1, ..., xn) y = (y1, ..., yn)Output
y1 y2 yn
x1 x2 xn
…
argmaxyP (y|x) = argmaxyP (y,x)
P (x)
![Page 23: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/23.jpg)
23
‣ First soluHon: Beam Search ‣ A beam is a set of parHal hypotheses ‣ Start with a single empty trajectory ‣ At each step, consider all conHnuaHon, discard most, keep top K
‣ But this does not guarantee the opHmal answer…
Inference (Decoding)
![Page 24: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/24.jpg)
The Viterbi Algorithm
24
‣ Dynamic program for compuHng the max score of a sequence of length i ending in tag yi
‣ Now this is an efficient algorithm!
![Page 25: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/25.jpg)
25
‣ Dynamic program for compuHng (for all i)
‣ IteraHve ComputaHon:
‣ For I = 1… n: ‣ Store score
‣ Store back-pointer
The Viterbi Algorithm
![Page 26: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/26.jpg)
Time flies like an arrow; Fruit flies like a banana
26
![Page 27: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/27.jpg)
27
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
Fruit Flies Like Bananas
![Page 28: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/28.jpg)
28
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
Fruit Flies Like Bananas
=0
=0.01
=0.03
![Page 29: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/29.jpg)
29
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
Fruit Flies Like Bananas
=0
=0.01
=0.03 =0.005
![Page 30: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/30.jpg)
30
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
Fruit Flies Like Bananas
=0
=0.01
=0.03 =0.005
=0.007
=0
![Page 31: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/31.jpg)
31
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
Fruit Flies Like Bananas
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
![Page 32: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/32.jpg)
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
![Page 33: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/33.jpg)
Fruit Flies Like Bananas
33
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
=0.00001
=0
=0.00003
![Page 34: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/34.jpg)
Fruit Flies Like Bananas
34
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
=0.00001
=0
=0.00003
![Page 35: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/35.jpg)
Fruit Flies Like Bananas
35
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
=0.00001
=0
=0.00003
![Page 36: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/36.jpg)
Why does this find the max p(.)? What is the runtime?
36
𝜋(1, 𝑁 )
𝜋(1, 𝑉 )
𝜋(1, 𝐼𝑁 )
𝜋(2, 𝑁 )
𝜋(2, 𝑉 )
𝜋(2, 𝐼𝑁 )
𝜋(3, 𝑁 )
𝜋(3, 𝑉 )
𝜋(3, 𝐼𝑁 )
𝜋(4, 𝑁 )
𝜋(4, 𝑉 )
𝜋(4, 𝐼𝑁 )
STA
RT
STO
P
=0
=0.01
=0.03 =0.005
=0.007
=0
=0.0007
=0.0003
=0.0001
=0.00001
=0
=0.00003
![Page 37: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/37.jpg)
The Viterbi Algorithm: RunHme
37
‣ Linear in sentence length ‣ Polynomial in the number of possible tags
‣ Total RunHme:
‣ Would there any scenarios where we would choose beam search?
![Page 38: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/38.jpg)
Tagsets in Different Languages
38
2942 = 86436
452 = 2045
112 = 121
![Page 39: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/39.jpg)
Trigram HMM Taggers‣ Trigram model: y1 = (<S>, NNP), y2 = (NNP, VBZ), …
‣ P((VBZ, NN) | (NNP, VBZ)) — more context! Noun-verb-noun S-V-O
‣ Tradeoff between model capacity and data size (sparsity) ‣ Trigrams are a “sweet spot” for POS tagging
![Page 40: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/40.jpg)
HMM POS Tagging
‣ Baseline: assign each word its most frequent tag: ~90% accuracy
‣ Trigram HMM: ~95% accuracy / 55% on unknown words
‣ TnT tagger (Brants 1998, tuned HMM): 96.2% accuracy / 86.0% on unks
Slide credit: Dan Klein
‣ State-of-the-art (BiLSTM-CRFs): 97.5% / 89%+
![Page 41: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/41.jpg)
Can we do beAer?
41
‣ HMM is a generaHve model, esHmaHon relies on counHng! ‣ Reminds you of something?
‣ Can we build a discriminaHve model, incorporaHng rich features?
![Page 42: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/42.jpg)
Named EnHty RecogniHon (NER)
Barack Obama will travel to Hangzhou today for the G20 mee=ng .
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣ BIO tagset: begin, inside, outside
‣ Why might an HMM not do so well here?
‣ Lots of O’s
‣ Sequence of tags — should we use an HMM?
‣ Insufficient features/capacity with mulHnomials (especially for unks)
![Page 43: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/43.jpg)
Emission Features for NER
Leicestershire is a nice place to visit…
I took a vaca=on to Boston
Apple released a new version…
According to the New York Times…
ORG
ORG
LOC
LOC
Texas governor Greg AbboI said
Leonardo DiCaprio won an award…
PER
PER
LOC
![Page 44: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/44.jpg)
Emission Features for NER
‣ Context features ‣ Words before/a�er
‣ Word features ‣ CapitalizaHon ‣ Word shape ‣ Prefixes/suffixes ‣ Lexical indicators
‣ Word clusters
Leicestershire
Boston
Apple released a new version…
According to the New York Times…
![Page 45: CS388: Natural Language Processing Lecture 4: Sequence Models](https://reader031.vdocuments.mx/reader031/viewer/2022012915/61c563ed9079f7167843b193/html5/thumbnails/45.jpg)
Maximum Entropy Markov Models (MEMM)
45
Chain rule
Independence assumpHon
‣ Log linear model for sequence tagging problem
‣ Learning:
‣ Train as a discrete log-linear model p(yi |yi−1, x1, …, xn)
‣ Scoring: