nlp: a peek into a day of a computational linguist
TRANSCRIPT
NLP: a peek into a dayof a computational linguist
Mariana RomanyshynGrammarly, Inc.
1. NLP applications in our world
2. What computational linguists do
3. Language levels
4. A closer look at part-of-speech tagging
5. A closer look at syntactic parsing
6. Let’s build something: error correction
2
Contents
3
Disclaimer
1. NLP applications in our world
5
What NLP applications do you know?
• Analysis
• Transformation
• Misc
6
Types of NLP Applications
ANALYSIS
Spam Filtering…
7
Types of NLP Applications
ANALYSIS
Spam FilteringSearch Engines…
8
Types of NLP Applications
ANALYSIS
Spam FilteringSearch EnginesSentiment Analysis…
9
Types of NLP Applications
Sentiment maps
10
11
It tastes amazing!It tastes horrible!It tastes normal.ABC tastes much better than DEF.
Sentiment Analysis
12
It tastes amazing!It tastes horrible!It tastes normal.ABC tastes much better than DEF.
It tastes like beer!It tastes interesting!It tastes like my mom said it would!If it was served with milk, it would taste great!
Sentiment Analysis
13
“That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.”— Douglas Adams
Terminal cases
14
“That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.”— Douglas Adams
Terminal cases
15
Sentiment Analysis
ANALYSIS
Spam FilteringSearch EnginesSentiment AnalysisSarcasm Detection…
16
Types of NLP Applications
17
Quite interesting
ANALYSIS
Spam FilteringSearch EnginesSentiment AnalysisSarcasm DetectionEssay Grading…
18
Types of NLP Applications
ANALYSIS
Spam FilteringSearch EnginesSentiment AnalysisSarcasm DetectionEssay GradingGood/Evil Characters…
19
Types of NLP Applications
TRANSFORMATION
Machine Translation…
20
Types of NLP Applications
Transformations in MT
21
TRANSFORMATION
Machine TranslationError Correction…
22
Types of NLP Applications
GEC should be smart
23
TRANSFORMATION
Machine TranslationError CorrectionSpeech to Text / Text to Speech…
24
Types of NLP Applications
TRANSFORMATION
Machine TranslationError CorrectionSpeech to Text / Text to SpeechQuestion Answering...
25
Types of NLP Applications
TRANSFORMATION
Machine TranslationError CorrectionSpeech to Text / Text to SpeechQuestion AnsweringText Summarization...
26
Types of NLP Applications
MISC
News reports generation…
27
Types of NLP Applications
MISC
News reports generationConversational Agents…
28
Types of NLP Applications
“I remember the first time we loaded these data sources into Siri. I typed “start over” into the system, and Siri came back saying, “Looking for businesses named ‘Over’ in Start, Louisiana.”— Adam Cheyer
29
Siri
30
The story of Tay
MISC
News reports generationConversational AgentsLanguage learning…
31
Types of NLP Applications
32
Duolingo
33
Duolingo
MISC
News & weather reports generationConversational AgentsLanguage learningStory Cloze Task…
34
Types of NLP Applications
Tom and Sheryl have been together for two years. One day, they went to a carnival. Tom won Sheryl several stuffed bears. When they reached the Ferris wheel, he got down on one knee.
Which ending is more probable?• Tom asked Sheryl to marry him.• He wiped mud off of his boot.
35
Story Cloze
2. What computational linguists do
37
38
39
Just FYI
3. Language levels
“Noam-enclature” and the structural linguistics
41
Language Levels
1) Language has a structure
2) Language is a system of signs
42
Units of language levels
Written text ?
Written text Paragraph
Sentence Word
Morpheme Letter
43
Units of language levels
How do we split...• text into paragraph?
44
Splitting problems
45
Splitting problemsHow do we split...• text into paragraph?
bullet points, word wrapping• paragraph into sentences?
46
Splitting problemsHow do we split...• text into paragraph?
bullet points, word wrapping• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.• sentence into words?
47
Splitting problemsHow do we split...• text into paragraph?
bullet points, word wrapping• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal• word into morphemes?
48
Splitting problemsHow do we split...• text into paragraph?
bullet points, word wrapping• paragraph into sentences?
Dr. Jones lectures at U.C.L.A.• sentence into words?
computer-aided, the d.t.s, San Francisco, 3$B deal• word into morphemes?
misadventuremisleadmistake - ?
49
FeaturesQuantitative features:• number of sentences, words, words per sentence, etc.• size and arrangement of paragraphs• word length• word position in a sentence• number of syllables in a word• ratio of vowels vs consonants
• depth of the word in the dependency tree of the sentence
• number of word senses
• ngrams
50
NgramsSequences of elements and their frequencies:• unigrams, bigrams, 3-grams, 4-grams, … n-grams
• at different language levels
– token ngrams:
• ("handsome”, ”man"): 160,000 ("pretty”, ”man"): 5,000
– character ngrams
• “st”: 14,000; “ct”: 4,000; “str”: 1,500; “ctr”: 50; “stra”: 400; “ctra”: 0
• adding grammar
– parts of speech
• (“go”, IN, “school”) : 600,000 (“go”, RB, “school”) : 10
– syntactic relations
• (“go”, nsubj, “kids”) : 200,000 (“go”, nsubj, “school”) : 20,000
51
FeaturesGrammatical features:• POS tag• morphemes: affixes, roots, endings• constituency spans• dependency relations• coreference• grammatical characteristics of various parts of speech:
– countability of nouns– tense of verbs– degree of comparison of adjectives– pronoun type– connector type
52
FeaturesSpelling features:• capitalized word?• hyphenated word?• compound word?
Lexical-semantic features:• WordNet• VerbNet• dictionaries and thesauri• word embeddings• modality of verbs
4. A closer look at part-of-speech tagging
Goal: categorize words by their functions.
English:• notional: noun, verb, adjective, adverb, pronoun (?), numeral (?)• functional: determiner, preposition, conjunction, particle, and
interjection
54
POS: recap
Wow, two hungry cats chased down the mouse to the corner and quickly ate it!
55
POS: practice
All you need is love . Love is all at the way you love me all the time
. And never mind that noise you heard . fire and of things that will bite , yeah
було так давно , коли в руках тримаю цейПросто налийте трохи коли на пошкоджену ділянку .
ударом . Я хочу мати всьо , і всьо наа на полі спозаранку мати жито жала , та
56
POS: more practice
Time flies like an arrow.I saw her duck with a telescope.She is calculating.We watched an Indian dance.They can fish.More lies ahead...
Це мало мало значення.Коло друзів та незнайомців.
57
POS: impossible cases
Time flies[Verb/Noun] like[Preposition/Verb] an arrow.I saw her duck[Verb/Noun] with a telescope.She is calculating[Verb/Adjective].We watched an Indian[Adjective/Noun] dance.They can[Modal Verb/Verb] fish[Verb/Noun].More lies[Verb/Noun] ahead...
Це мало[Дієслово/Прислівник] мало[Дієслово/Прислівник] значення.Коло[Іменник/Прийменник] друзів та незнайомців.
58
POS: impossible cases
59
What POS should gotta be?
I gotta tell you something.I’ve gotta fix that thingy for her, Jack.So, she gotta this gorgeous dress.So, she gotta gun.
60
POS: disputable cases
What POS should gotta be?
I gotta[modal verb] tell you something.I’ve gotta[verb, 3rd form] fix that thingy for her, Jack.So, she gotta[verb, 2nd form] this gorgeous dress.So, she gotta[verb, 2nd form] gun.
61
POS: disputable cases
62
If you don’t know, how would the machine know?
63
So, what do we do?
Penn Treebank tagset:• noun: NN, NNS, NNP, NNPS• verb: VB, VBP, VBZ, VBG, VBD, VBN, MD• adjective: JJ, JJR, JJS• adverb: RB, RBR, RBS• preposition and sub. conjunction: IN• pronoun: PRP, PRP$• determiner: DT• numeral: CD• particle: RP, TO• interjection: UH• coord. conjunction: CC• wh-words: WDT, WP, WP$, WRB• more: PDT, POS, SYM, FW, EX, LS, $, |,|, |.|, |:|, |''|, |``|, -RRB-, -LRB-
64
POS: tagsets
Very_RB peculiar_JJ retribution_NN indeed_RB seems_VBZ to_TO overtake_VB such_JJ jokers_NNS ._.Have_VBP you_PRP ever_RB heard_VBN of_IN Thuggee_NNP ?_.Sort_NN of_IN remorseless_JJ ,_, is_VBZ n't_RB it_PRP ?_.In_IN short_JJ ,_, and_CC to_TO borrow_VB an_DT arboreal_JJ phrase_NN ,_, slash_VB timber_NN ._.As_IN you_PRP can_MD count_VB on_IN me_PRP to_TO do_VB the_DT same_JJ ._.Compassionately_RB yours_PRP ,_, S.J._NNP Perelman_NNPWe_PRP caught_VBD the_DT early_JJ train_NN to_IN New_NNP York_NNP ._.Petite_JJ ,_, lovely_JJ Yvette_NNP Chadroe_NNP plays_VBZ the_DT nymphomaniac_NN engagingly_RB ._.He_PRP looked_VBD so_RB comfortable_JJ being_VBG straight_JJ ._.They_PRP wanted_VBD to_TO touch_VB the_DT mystery_NN ._....
65
POS: corpora
• Use a classifier to tag each word independently• Features
– left/right context: words, POS tags, words + POS tags– probability of word + POS tag– additional:
• possible tags for the word• morphological characteristics (tense, plurality, degree of comparison)• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie[NNP] ,[,] we[PRP] 're[VBP] home[NN/RB] - ? .[.]
Output: RB66
POS: Classification
• Map the sentence to the most probable POS tag sequence• Features
– left/right context: words, POS tags, words + POS tags– probability of word + POS tag– additional:
• possible tags for the word• morphological characteristics (tense, plurality, degree of comparison)• the word’s spelling (suffixes, capitalization, hyphenation)
Input: Chewie , we 're home .Output: NNP , PRP VBP RB .
67
POS: Sequence Labelling
Notation:• V - vocabulary• T - POS tags• x - sentence (observation)• y - tag sequences (state)• S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
– n > 0– xi V∈– yi T∈
68
Hidden Markov Models
S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .y: NNP , PRP VBP RB .
NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . …
Aim: find {x1 . . . xn, y1 . . . yn} with the highest probability.69
Hidden Markov Models
• Markov Assumption: "The future is independent of the past given the present."
– Trigram HMM: each state depends only on the previous two states in the sequence
• Independence assumption:– the state of xi depends only on the value of yi, independent of the
previous observations and states
70
HMM: assumptions
S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}
x: Chewie , we 're home .y: NNP , PRP VBP RB .
NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . ...
71
HMM: assumptions
• q(s|u, v) - the probability of tag s after the tags (u, v)– s, u, v T∈
• e(x|s) - the probability of observation x paired with state s– x V, s T∈ ∈
72
Trigram HMM: parameters
• q(s|u, v) - the probability of tag s after the tags (u, v)– s, u, v T∈
• e(x|s) - the probability of observation x paired with state s– x V, s T∈ ∈
73
Trigram HMM: parameters
74
For example
x: Chewie , we 're home .y: NNP , PRP VBP RB .
How do we get p(x, y)?
75
For example
x: Chewie , we 're home .y: NNP , PRP VBP RB .
p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
76
One thing missing
x: Chewie , we 're home .y: <S> <S> NNP , PRP VBP RB . </S>
p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(<S>,<S>,NNP)/c(<S>,<S>) * c(<S>,NNP,|,|)/c(<S>,NNP) * c(RB,|.|,</S>)/c(RB,|.|) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)
Enumerating all possible tag sequences is not feasible — Tn.E.g.:
44 tags ** 6-token sentence = 7,256,313,856 tag sequences
Ideas:• use dynamic programming (the Viterbi algorithm)• limit the number of candidates with a dictionary
77
HMM: problem 1
78
HMM: the Viterbi algorithm
Idea: remember decisions on the way — n*T3.
x: Chewie , we 're home .y: <S> <S> NN , RB NNP VBP . </S> NNP , CD WP VB . NNS , EX PRP$ RB . NNPS , CC VBP NN . JJ , IN PRP JJ . JJR , NNP JJS TO . RRB , PRP RBS RP . VBZ , LS CD IN . ...
79
HMM: with dictionary
Idea: use a dictionary — n*83. (Worst case is still n*T3.)
x: Chewie , we 're home .y: <S> <S> NNP , PRP VBP VB . </S>
NN VBP RB NN
Zero probabilities can occur because of OOV or rare words.
Idea: use smoothing!• add-1: pretend you saw each word one more time
(P.S. It’s usually a horrible choice, but we’ll use it today. Don’t tell anyone.)• Good-Turing: reallocate the probability of n-grams that occur
r+1 times to the n-grams that occur r times• Kneser-Ney: when the bigram count is near 0, rely on unigram• ...
80
HMM: problem 2
81
Implementationhttps://github.com/mariana-scorp/one-day-with-cling
Conclusion
82
“Data is ten times more powerful than algorithms.”— Peter NorvigThe Unreasonable Effectiveness of Datahttp://youtu.be/yvDCzhbjYWs
5. A closer look at syntactic parsing
Goal: categorize sentence parts by their functions and define dependencies.
Sentence:• main clause• subordinate clause
Clause:• subject• predicate• direct/indirect/prepositional object• modifier• complement 84
Syntax: recap
Sentence:If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
85
Syntax: practice
Sentence:If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
Clauses:• [[you] want [to receive [e-mails about my upcoming shows]]]• [please give [me] [money]]• [[I] can buy [a computer]]
86
Syntax: practice
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.
87
Syntax: the subject
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.
88
Syntax: the subject
Identify the subject:
• The walrus and the carpenter were walking close at hand.
• The greatest trick the devil ever pulled was convincing the world he didn't exist.
• What we've got here is a failure to communicate.
• Actually being funny is mostly telling the truth about things.
• To be idle is a short road to death, and to be diligent is a way of life.
• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.
89
Syntax: the subject
Identify the role of the infinitive:
• The two politicians failed [to communicate].• What we've got here is a failure [to communicate].• [To be idle] is a short road to death, and [to be diligent] is a way of
life.• [To become extroverted], you need to go out and socialize.• You have [to be able [to actually quote the line]] for it [to be a
memorable quote].
90
Syntax: the infinitives
91
How do we formalize the syntactic structure?
92
Answer:
Types:• constituency tree
– every token is a part of some phrase constituent (parent node)– includes terminal and non-terminal nodes– shows relations among the constituents
• dependency tree– for every token, there is one node– includes only terminal nodes– shows relations among words
93
Syntactic Trees (or Parse Trees)
If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
94
Constituency Tree
If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
95
Constituency Tree
96
Constituency Treebank(TOP (S (NP (ADJP (RB Very) (JJ peculiar)) (NN retribution)) (ADVP (RB indeed)) (VP (VBZ seems) (S (VP (TO to) (VP (VB overtake) (NP (JJ such) (NNS jokers)))))) (. .)))(TOP (SQ (VBP Have) (NP (PRP you)) (ADVP (RB ever)) (VP (VBN heard) (PP (IN of) (NP (NNP Thuggee)))) (. ?)))(TOP (UCP (ADJP (ADVP (NN Sort) (IN of)) (JJ remorseless)) (, ,) (SQ (VBZ is) (RB n't) (NP (PRP it))) (. ?)))(TOP (SBAR (IN As) (S (NP (PRP you)) (VP (MD can) (VP (VB count) (PP (IN on) (NP (PRP me))) (S (VP (TO to) (VP (VB do) (NP (DT the) (JJ same)))))))) (. .)))(TOP (FRAG (ADJP (RB Compassionately) (PRP yours)) (, ,) (NP (NNP S.J.) (NNP Perelman))))(TOP (S (NP (PRP We)) (VP (VBD caught) (NP (NP (DT the) (JJ early) (NN train)) (PP (IN to) (NP (NNP New) (NNP York))))) (. .)))(TOP (S (NP (JJ Petite) (, ,) (JJ lovely) (NNP Yvette) (NNP Chadroe)) (VP (VBZ plays) (NP (DT the) (NN nymphomaniac)) (ADVP (RB engagingly))) (. .)))...
Penn Treebank tagset:• top level: TOP• sentence: S, SBAR, SQ, SBARQ, SINV• fragment: FRAG• noun phrase: NP• verb phrase: VP• prepositional phrase: PP• adjectival phrase: ADJP• adverbial phrase: ADVP• compound conjunction: CONJP• wh-phrases: WHNP, WHPP, WHADJP, WHADVP• more: LST, PRT, INTJ, NAC, PRN, QP, RRC, UCP, X
97
Constituency Labels
• Algorithms:– top-down– chart– bottom-up
• Features include:– grammar (a.k.a. transitions)– spans of nodes– labels– right/left/right and left context– split point, etc.
• Weights are trained on the treebank.98
Constituency Parsing
99
Shift-reduce constituency parsing
• Data– queue: the words of the sentence– stack: partially completed trees
• Actions– shift: move the word from the queue onto the stack– reduce: add a new label on top of the first n constituents on
the stack
101
Syntax: impossible casesMost cats and dogs with fleas live in the neighbourhood.
102
Syntax: impossible casesMost cats and dogs with fleas live in the neighbourhood.
103
Syntax: impossible casesWanted: a nurse for a baby about twenty years old.
104
Syntax: impossible casesWanted: a nurse for a baby about twenty years old.
105
Syntax: impossible casesI shot an elephant in my pajamas.
106
Syntax: impossible casesI shot an elephant in my pajamas.
107
Syntax: impossible casesI once saw a deer riding my bicycle.
108
Syntax: impossible casesI once saw a deer riding my bicycle.
109
Syntax: impossible casesI’m glad I’m a man, and so is Lola.
110
Syntax: impossible casesI’m glad I’m a man, and so is Lola.
Types:• constituency tree
– every token is a part of some phrase constituent (parent node)– includes terminal and non-terminal nodes– shows relations among the constituents
• dependency tree– for every token, there is one node– includes only terminal nodes– shows relations among words
111
Syntactic Trees (or Parse Trees)
Universal dependencies:• subject: NSUBJ, NSUBJPASS, CSUBJ, CSUBJPASS• object: DATIVE, DOBJ, AGENT, OPRD• complement: ACOMP, CCOMP, XCOMP, PCOMP• auxiliary: AUX, AUXPASS• clausal modifier: ACL, ADVCL, RELCL• different modifier: ADVMOD, NPADVMOD, AMOD, COMPOUND, NEG, NUMMOD,
QUANTMOD• determiner: DET, PREDET• apposition: APPOS• coordinating conjunction and conjuct: CC, CONJ• prepositional modifier and its object: PREP, POBJ• more: POSS, CASE, DEP, EXPL, INTJ, MARK, PRECONJ, PRT, PUNCT, PARATAXIS
112
Dependency Relations
113
Dependency TreeIf you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
114
Dependency TreeIf you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.
• Graph-Based Parsing– find the highest score tree from a complete graph– slow, but performs better on long-distance dependencies– e.g., MSTParser
• Transition-Based Parsing– apply transition actions one by one– faster, but performs better on short-distance dependencies– e.g., MaltParser, the Stanford Parser, ZPAR
115
Algorithms
116
Graph-Based Parsing
• Data– queue: the words of the sentence– stack: partially completed trees
• Actions:– shift: move the word from the queue onto the stack– reduce: pop the stack, removing only its top item, as long as that
item has a head– right-arc: create a right dependency arc between the word on top of
the stack and the next token in the queue– left-arc: create a left dependency arc between the word on top of
the stack and the next token in the queue117
Transition-Based Parsing
Features
119
120
Implementationhttps://github.com/mariana-scorp/one-day-with-cling
121
Conclusion
122
Syntax: impossible casesWe eat pizza with anchovy.
123
Syntax: impossible casesWe eat pizza with anchovy.
124
Syntax: impossible casesНасильство твій макіяж не приховає!
125
Syntax: impossible casesНасильство твій макіяж не приховає!
6. Let’s build something: error correction
We likes pizza with anchovy.Children like and cherishes her kindness and cooking skills.Some is watching the way she knits and loving it.Colorless green ideas sleeps furiously.Barry and Mary, whom I met at the New Year 's party, is just the
cutest people.There is two cats and a dog.
127
Subject-verb disagreement
Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Rules: if the verb has nsubj relation and the subject does not have a conjunct, we should correct it…
Correction: use a dictionary of transformations
128
Rule-based Toy Solution
Text processing: tokenization, POS tagging, syntactic parsing, etc.
Detection: find a VBZ
Classifier + features: POS tag of the subject, does the subject have a conjunct...
Correction: use a dictionary of transformations
129
ML-based Toy Solution
130
Implementationgithub.com/mariana-scorp/one-day-with-cling
131
Presenter:Mariana [email protected]
With the help of:Oksana [email protected]
Khrystyna [email protected]
Tetiana [email protected]
Tetiana [email protected]
Contact us
132