nlp: a peek into a day of a computational linguist

132
NLP: a peek into a day of a computational linguist Mariana Romanyshyn Grammarly, Inc.

Upload: mariana-romanyshyn

Post on 13-Apr-2017

364 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: NLP: a peek into a day of a computational linguist

NLP: a peek into a dayof a computational linguist

Mariana RomanyshynGrammarly, Inc.

Page 2: NLP: a peek into a day of a computational linguist

1. NLP applications in our world

2. What computational linguists do

3. Language levels

4. A closer look at part-of-speech tagging

5. A closer look at syntactic parsing

6. Let’s build something: error correction

2

Contents

Page 3: NLP: a peek into a day of a computational linguist

3

Disclaimer

Page 4: NLP: a peek into a day of a computational linguist

1. NLP applications in our world

Page 5: NLP: a peek into a day of a computational linguist

5

What NLP applications do you know?

Page 6: NLP: a peek into a day of a computational linguist

• Analysis

• Transformation

• Misc

6

Types of NLP Applications

Page 7: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam Filtering…

7

Types of NLP Applications

Page 8: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam FilteringSearch Engines…

8

Types of NLP Applications

Page 9: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam FilteringSearch EnginesSentiment Analysis…

9

Types of NLP Applications

Page 10: NLP: a peek into a day of a computational linguist

Sentiment maps

10

Page 11: NLP: a peek into a day of a computational linguist

11

It tastes amazing!It tastes horrible!It tastes normal.ABC tastes much better than DEF.

Sentiment Analysis

Page 12: NLP: a peek into a day of a computational linguist

12

It tastes amazing!It tastes horrible!It tastes normal.ABC tastes much better than DEF.

It tastes like beer!It tastes interesting!It tastes like my mom said it would!If it was served with milk, it would taste great!

Sentiment Analysis

Page 13: NLP: a peek into a day of a computational linguist

13

“That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.”— Douglas Adams

Terminal cases

Page 14: NLP: a peek into a day of a computational linguist

14

“That young girl is one of the least benightedly unintelligent organic life forms [that] it has been my profound lack of pleasure not to be able to avoid meeting.”— Douglas Adams

Terminal cases

Page 15: NLP: a peek into a day of a computational linguist

15

Sentiment Analysis

Page 16: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam FilteringSearch EnginesSentiment AnalysisSarcasm Detection…

16

Types of NLP Applications

Page 17: NLP: a peek into a day of a computational linguist

17

Quite interesting

Page 18: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam FilteringSearch EnginesSentiment AnalysisSarcasm DetectionEssay Grading…

18

Types of NLP Applications

Page 19: NLP: a peek into a day of a computational linguist

ANALYSIS

Spam FilteringSearch EnginesSentiment AnalysisSarcasm DetectionEssay GradingGood/Evil Characters…

19

Types of NLP Applications

Page 20: NLP: a peek into a day of a computational linguist

TRANSFORMATION

Machine Translation…

20

Types of NLP Applications

Page 21: NLP: a peek into a day of a computational linguist

Transformations in MT

21

Page 22: NLP: a peek into a day of a computational linguist

TRANSFORMATION

Machine TranslationError Correction…

22

Types of NLP Applications

Page 23: NLP: a peek into a day of a computational linguist

GEC should be smart

23

Page 24: NLP: a peek into a day of a computational linguist

TRANSFORMATION

Machine TranslationError CorrectionSpeech to Text / Text to Speech…

24

Types of NLP Applications

Page 25: NLP: a peek into a day of a computational linguist

TRANSFORMATION

Machine TranslationError CorrectionSpeech to Text / Text to SpeechQuestion Answering...

25

Types of NLP Applications

Page 26: NLP: a peek into a day of a computational linguist

TRANSFORMATION

Machine TranslationError CorrectionSpeech to Text / Text to SpeechQuestion AnsweringText Summarization...

26

Types of NLP Applications

Page 27: NLP: a peek into a day of a computational linguist

MISC

News reports generation…

27

Types of NLP Applications

Page 28: NLP: a peek into a day of a computational linguist

MISC

News reports generationConversational Agents…

28

Types of NLP Applications

Page 29: NLP: a peek into a day of a computational linguist

“I remember the first time we loaded these data sources into Siri. I typed “start over” into the system, and Siri came back saying, “Looking for businesses named ‘Over’ in Start, Louisiana.”— Adam Cheyer

29

Siri

Page 30: NLP: a peek into a day of a computational linguist

30

The story of Tay

Page 31: NLP: a peek into a day of a computational linguist

MISC

News reports generationConversational AgentsLanguage learning…

31

Types of NLP Applications

Page 32: NLP: a peek into a day of a computational linguist

32

Duolingo

Page 33: NLP: a peek into a day of a computational linguist

33

Duolingo

Page 34: NLP: a peek into a day of a computational linguist

MISC

News & weather reports generationConversational AgentsLanguage learningStory Cloze Task…

34

Types of NLP Applications

Page 35: NLP: a peek into a day of a computational linguist

Tom and Sheryl have been together for two years. One day, they went to a carnival. Tom won Sheryl several stuffed bears. When they reached the Ferris wheel, he got down on one knee.

Which ending is more probable?• Tom asked Sheryl to marry him.• He wiped mud off of his boot.

35

Story Cloze

Page 36: NLP: a peek into a day of a computational linguist

2. What computational linguists do

Page 37: NLP: a peek into a day of a computational linguist

37

Page 38: NLP: a peek into a day of a computational linguist

38

Page 39: NLP: a peek into a day of a computational linguist

39

Just FYI

Page 40: NLP: a peek into a day of a computational linguist

3. Language levels

Page 41: NLP: a peek into a day of a computational linguist

“Noam-enclature” and the structural linguistics

41

Language Levels

1) Language has a structure

2) Language is a system of signs

Page 42: NLP: a peek into a day of a computational linguist

42

Units of language levels

Written text ?

Page 43: NLP: a peek into a day of a computational linguist

Written text Paragraph

Sentence Word

Morpheme Letter

43

Units of language levels

Page 44: NLP: a peek into a day of a computational linguist

How do we split...• text into paragraph?

44

Splitting problems

Page 45: NLP: a peek into a day of a computational linguist

45

Splitting problemsHow do we split...• text into paragraph?

bullet points, word wrapping• paragraph into sentences?

Page 46: NLP: a peek into a day of a computational linguist

46

Splitting problemsHow do we split...• text into paragraph?

bullet points, word wrapping• paragraph into sentences?

Dr. Jones lectures at U.C.L.A.• sentence into words?

Page 47: NLP: a peek into a day of a computational linguist

47

Splitting problemsHow do we split...• text into paragraph?

bullet points, word wrapping• paragraph into sentences?

Dr. Jones lectures at U.C.L.A.• sentence into words?

computer-aided, the d.t.s, San Francisco, 3$B deal• word into morphemes?

Page 48: NLP: a peek into a day of a computational linguist

48

Splitting problemsHow do we split...• text into paragraph?

bullet points, word wrapping• paragraph into sentences?

Dr. Jones lectures at U.C.L.A.• sentence into words?

computer-aided, the d.t.s, San Francisco, 3$B deal• word into morphemes?

misadventuremisleadmistake - ?

Page 49: NLP: a peek into a day of a computational linguist

49

FeaturesQuantitative features:• number of sentences, words, words per sentence, etc.• size and arrangement of paragraphs• word length• word position in a sentence• number of syllables in a word• ratio of vowels vs consonants

• depth of the word in the dependency tree of the sentence

• number of word senses

• ngrams

Page 50: NLP: a peek into a day of a computational linguist

50

NgramsSequences of elements and their frequencies:• unigrams, bigrams, 3-grams, 4-grams, … n-grams

• at different language levels

– token ngrams:

• ("handsome”, ”man"): 160,000 ("pretty”, ”man"): 5,000

– character ngrams

• “st”: 14,000; “ct”: 4,000; “str”: 1,500; “ctr”: 50; “stra”: 400; “ctra”: 0

• adding grammar

– parts of speech

• (“go”, IN, “school”) : 600,000 (“go”, RB, “school”) : 10

– syntactic relations

• (“go”, nsubj, “kids”) : 200,000 (“go”, nsubj, “school”) : 20,000

Page 51: NLP: a peek into a day of a computational linguist

51

FeaturesGrammatical features:• POS tag• morphemes: affixes, roots, endings• constituency spans• dependency relations• coreference• grammatical characteristics of various parts of speech:

– countability of nouns– tense of verbs– degree of comparison of adjectives– pronoun type– connector type

Page 52: NLP: a peek into a day of a computational linguist

52

FeaturesSpelling features:• capitalized word?• hyphenated word?• compound word?

Lexical-semantic features:• WordNet• VerbNet• dictionaries and thesauri• word embeddings• modality of verbs

Page 53: NLP: a peek into a day of a computational linguist

4. A closer look at part-of-speech tagging

Page 54: NLP: a peek into a day of a computational linguist

Goal: categorize words by their functions.

English:• notional: noun, verb, adjective, adverb, pronoun (?), numeral (?)• functional: determiner, preposition, conjunction, particle, and

interjection

54

POS: recap

Page 55: NLP: a peek into a day of a computational linguist

Wow, two hungry cats chased down the mouse to the corner and quickly ate it!

55

POS: practice

Page 56: NLP: a peek into a day of a computational linguist

All you need is love . Love is all at the way you love me all the time

. And never mind that noise you heard . fire and of things that will bite , yeah

було так давно , коли в руках тримаю цейПросто налийте трохи коли на пошкоджену ділянку .

ударом . Я хочу мати всьо , і всьо наа на полі спозаранку мати жито жала , та

56

POS: more practice

Page 57: NLP: a peek into a day of a computational linguist

Time flies like an arrow.I saw her duck with a telescope.She is calculating.We watched an Indian dance.They can fish.More lies ahead...

Це мало мало значення.Коло друзів та незнайомців.

57

POS: impossible cases

Page 58: NLP: a peek into a day of a computational linguist

Time flies[Verb/Noun] like[Preposition/Verb] an arrow.I saw her duck[Verb/Noun] with a telescope.She is calculating[Verb/Adjective].We watched an Indian[Adjective/Noun] dance.They can[Modal Verb/Verb] fish[Verb/Noun].More lies[Verb/Noun] ahead...

Це мало[Дієслово/Прислівник] мало[Дієслово/Прислівник] значення.Коло[Іменник/Прийменник] друзів та незнайомців.

58

POS: impossible cases

Page 59: NLP: a peek into a day of a computational linguist

59

Page 60: NLP: a peek into a day of a computational linguist

What POS should gotta be?

I gotta tell you something.I’ve gotta fix that thingy for her, Jack.So, she gotta this gorgeous dress.So, she gotta gun.

60

POS: disputable cases

Page 61: NLP: a peek into a day of a computational linguist

What POS should gotta be?

I gotta[modal verb] tell you something.I’ve gotta[verb, 3rd form] fix that thingy for her, Jack.So, she gotta[verb, 2nd form] this gorgeous dress.So, she gotta[verb, 2nd form] gun.

61

POS: disputable cases

Page 62: NLP: a peek into a day of a computational linguist

62

If you don’t know, how would the machine know?

Page 63: NLP: a peek into a day of a computational linguist

63

So, what do we do?

Page 64: NLP: a peek into a day of a computational linguist

Penn Treebank tagset:• noun: NN, NNS, NNP, NNPS• verb: VB, VBP, VBZ, VBG, VBD, VBN, MD• adjective: JJ, JJR, JJS• adverb: RB, RBR, RBS• preposition and sub. conjunction: IN• pronoun: PRP, PRP$• determiner: DT• numeral: CD• particle: RP, TO• interjection: UH• coord. conjunction: CC• wh-words: WDT, WP, WP$, WRB• more: PDT, POS, SYM, FW, EX, LS, $, |,|, |.|, |:|, |''|, |``|, -RRB-, -LRB-

64

POS: tagsets

Page 65: NLP: a peek into a day of a computational linguist

Very_RB peculiar_JJ retribution_NN indeed_RB seems_VBZ to_TO overtake_VB such_JJ jokers_NNS ._.Have_VBP you_PRP ever_RB heard_VBN of_IN Thuggee_NNP ?_.Sort_NN of_IN remorseless_JJ ,_, is_VBZ n't_RB it_PRP ?_.In_IN short_JJ ,_, and_CC to_TO borrow_VB an_DT arboreal_JJ phrase_NN ,_, slash_VB timber_NN ._.As_IN you_PRP can_MD count_VB on_IN me_PRP to_TO do_VB the_DT same_JJ ._.Compassionately_RB yours_PRP ,_, S.J._NNP Perelman_NNPWe_PRP caught_VBD the_DT early_JJ train_NN to_IN New_NNP York_NNP ._.Petite_JJ ,_, lovely_JJ Yvette_NNP Chadroe_NNP plays_VBZ the_DT nymphomaniac_NN engagingly_RB ._.He_PRP looked_VBD so_RB comfortable_JJ being_VBG straight_JJ ._.They_PRP wanted_VBD to_TO touch_VB the_DT mystery_NN ._....

65

POS: corpora

Page 66: NLP: a peek into a day of a computational linguist

• Use a classifier to tag each word independently• Features

– left/right context: words, POS tags, words + POS tags– probability of word + POS tag– additional:

• possible tags for the word• morphological characteristics (tense, plurality, degree of comparison)• the word’s spelling (suffixes, capitalization, hyphenation)

Input: Chewie[NNP] ,[,] we[PRP] 're[VBP] home[NN/RB] - ? .[.]

Output: RB66

POS: Classification

Page 67: NLP: a peek into a day of a computational linguist

• Map the sentence to the most probable POS tag sequence• Features

– left/right context: words, POS tags, words + POS tags– probability of word + POS tag– additional:

• possible tags for the word• morphological characteristics (tense, plurality, degree of comparison)• the word’s spelling (suffixes, capitalization, hyphenation)

Input: Chewie , we 're home .Output: NNP , PRP VBP RB .

67

POS: Sequence Labelling

Page 68: NLP: a peek into a day of a computational linguist

Notation:• V - vocabulary• T - POS tags• x - sentence (observation)• y - tag sequences (state)• S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}

– n > 0– xi V∈– yi T∈

68

Hidden Markov Models

Page 69: NLP: a peek into a day of a computational linguist

S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}

x: Chewie , we 're home .y: NNP , PRP VBP RB .

NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . …

Aim: find {x1 . . . xn, y1 . . . yn} with the highest probability.69

Hidden Markov Models

Page 70: NLP: a peek into a day of a computational linguist

• Markov Assumption: "The future is independent of the past given the present."

– Trigram HMM: each state depends only on the previous two states in the sequence

• Independence assumption:– the state of xi depends only on the value of yi, independent of the

previous observations and states

70

HMM: assumptions

Page 71: NLP: a peek into a day of a computational linguist

S - all sentence/tag-sequence pairs {x1 . . . xn, y1 . . . yn}

x: Chewie , we 're home .y: NNP , PRP VBP RB .

NN , PRP VBP RB . NNP , PRP VBP NN . NN , PRP VBP NN . ...

71

HMM: assumptions

Page 72: NLP: a peek into a day of a computational linguist

• q(s|u, v) - the probability of tag s after the tags (u, v)– s, u, v T∈

• e(x|s) - the probability of observation x paired with state s– x V, s T∈ ∈

72

Trigram HMM: parameters

Page 73: NLP: a peek into a day of a computational linguist

• q(s|u, v) - the probability of tag s after the tags (u, v)– s, u, v T∈

• e(x|s) - the probability of observation x paired with state s– x V, s T∈ ∈

73

Trigram HMM: parameters

Page 74: NLP: a peek into a day of a computational linguist

74

For example

x: Chewie , we 're home .y: NNP , PRP VBP RB .

How do we get p(x, y)?

Page 75: NLP: a peek into a day of a computational linguist

75

For example

x: Chewie , we 're home .y: NNP , PRP VBP RB .

p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)

Page 76: NLP: a peek into a day of a computational linguist

76

One thing missing

x: Chewie , we 're home .y: <S> <S> NNP , PRP VBP RB . </S>

p(x, y) = c(NNP,|,|,PRP)/c(NNP,|,|) * c(|,|,PRP, VBP)/c(|,|,PRP) * c(PRP, VBP,RB)/c(PRP,VBP) * c(VBP,RB,|.|)/c(VBP,RB) * c(<S>,<S>,NNP)/c(<S>,<S>) * c(<S>,NNP,|,|)/c(<S>,NNP) * c(RB,|.|,</S>)/c(RB,|.|) * c(NNP->Chewie)/c(NNP) * c(|,|->,)/c(|,|) * c(PRP->we)/c(PRP) * c(VBP->’re)/c(VBP) * c(RB->home)/c(RB) * c(|.|->.)/c(|.|)

Page 77: NLP: a peek into a day of a computational linguist

Enumerating all possible tag sequences is not feasible — Tn.E.g.:

44 tags ** 6-token sentence = 7,256,313,856 tag sequences

Ideas:• use dynamic programming (the Viterbi algorithm)• limit the number of candidates with a dictionary

77

HMM: problem 1

Page 78: NLP: a peek into a day of a computational linguist

78

HMM: the Viterbi algorithm

Idea: remember decisions on the way — n*T3.

x: Chewie , we 're home .y: <S> <S> NN , RB NNP VBP . </S> NNP , CD WP VB . NNS , EX PRP$ RB . NNPS , CC VBP NN . JJ , IN PRP JJ . JJR , NNP JJS TO . RRB , PRP RBS RP . VBZ , LS CD IN . ...

Page 79: NLP: a peek into a day of a computational linguist

79

HMM: with dictionary

Idea: use a dictionary — n*83. (Worst case is still n*T3.)

x: Chewie , we 're home .y: <S> <S> NNP , PRP VBP VB . </S>

NN VBP RB NN

Page 80: NLP: a peek into a day of a computational linguist

Zero probabilities can occur because of OOV or rare words.

Idea: use smoothing!• add-1: pretend you saw each word one more time

(P.S. It’s usually a horrible choice, but we’ll use it today. Don’t tell anyone.)• Good-Turing: reallocate the probability of n-grams that occur

r+1 times to the n-grams that occur r times• Kneser-Ney: when the bigram count is near 0, rely on unigram• ...

80

HMM: problem 2

Page 81: NLP: a peek into a day of a computational linguist

81

Implementationhttps://github.com/mariana-scorp/one-day-with-cling

Page 82: NLP: a peek into a day of a computational linguist

Conclusion

82

“Data is ten times more powerful than algorithms.”— Peter NorvigThe Unreasonable Effectiveness of Datahttp://youtu.be/yvDCzhbjYWs

Page 83: NLP: a peek into a day of a computational linguist

5. A closer look at syntactic parsing

Page 84: NLP: a peek into a day of a computational linguist

Goal: categorize sentence parts by their functions and define dependencies.

Sentence:• main clause• subordinate clause

Clause:• subject• predicate• direct/indirect/prepositional object• modifier• complement 84

Syntax: recap

Page 85: NLP: a peek into a day of a computational linguist

Sentence:If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

85

Syntax: practice

Page 86: NLP: a peek into a day of a computational linguist

Sentence:If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

Clauses:• [[you] want [to receive [e-mails about my upcoming shows]]]• [please give [me] [money]]• [[I] can buy [a computer]]

86

Syntax: practice

Page 87: NLP: a peek into a day of a computational linguist

Identify the subject:

• The walrus and the carpenter were walking close at hand.

• The greatest trick the devil ever pulled was convincing the world he didn't exist.

• What we've got here is a failure to communicate.

• Actually being funny is mostly telling the truth about things.

• To be idle is a short road to death, and to be diligent is a way of life.

• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.

87

Syntax: the subject

Page 88: NLP: a peek into a day of a computational linguist

Identify the subject:

• The walrus and the carpenter were walking close at hand.

• The greatest trick the devil ever pulled was convincing the world he didn't exist.

• What we've got here is a failure to communicate.

• Actually being funny is mostly telling the truth about things.

• To be idle is a short road to death, and to be diligent is a way of life.

• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.

88

Syntax: the subject

Page 89: NLP: a peek into a day of a computational linguist

Identify the subject:

• The walrus and the carpenter were walking close at hand.

• The greatest trick the devil ever pulled was convincing the world he didn't exist.

• What we've got here is a failure to communicate.

• Actually being funny is mostly telling the truth about things.

• To be idle is a short road to death, and to be diligent is a way of life.

• Sitting in a tree at the bottom of the garden was a huge black bird with long blue tail feathers.

89

Syntax: the subject

Page 90: NLP: a peek into a day of a computational linguist

Identify the role of the infinitive:

• The two politicians failed [to communicate].• What we've got here is a failure [to communicate].• [To be idle] is a short road to death, and [to be diligent] is a way of

life.• [To become extroverted], you need to go out and socialize.• You have [to be able [to actually quote the line]] for it [to be a

memorable quote].

90

Syntax: the infinitives

Page 91: NLP: a peek into a day of a computational linguist

91

How do we formalize the syntactic structure?

Page 92: NLP: a peek into a day of a computational linguist

92

Answer:

Page 93: NLP: a peek into a day of a computational linguist

Types:• constituency tree

– every token is a part of some phrase constituent (parent node)– includes terminal and non-terminal nodes– shows relations among the constituents

• dependency tree– for every token, there is one node– includes only terminal nodes– shows relations among words

93

Syntactic Trees (or Parse Trees)

Page 94: NLP: a peek into a day of a computational linguist

If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

94

Constituency Tree

Page 95: NLP: a peek into a day of a computational linguist

If you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

95

Constituency Tree

Page 96: NLP: a peek into a day of a computational linguist

96

Constituency Treebank(TOP (S (NP (ADJP (RB Very) (JJ peculiar)) (NN retribution)) (ADVP (RB indeed)) (VP (VBZ seems) (S (VP (TO to) (VP (VB overtake) (NP (JJ such) (NNS jokers)))))) (. .)))(TOP (SQ (VBP Have) (NP (PRP you)) (ADVP (RB ever)) (VP (VBN heard) (PP (IN of) (NP (NNP Thuggee)))) (. ?)))(TOP (UCP (ADJP (ADVP (NN Sort) (IN of)) (JJ remorseless)) (, ,) (SQ (VBZ is) (RB n't) (NP (PRP it))) (. ?)))(TOP (SBAR (IN As) (S (NP (PRP you)) (VP (MD can) (VP (VB count) (PP (IN on) (NP (PRP me))) (S (VP (TO to) (VP (VB do) (NP (DT the) (JJ same)))))))) (. .)))(TOP (FRAG (ADJP (RB Compassionately) (PRP yours)) (, ,) (NP (NNP S.J.) (NNP Perelman))))(TOP (S (NP (PRP We)) (VP (VBD caught) (NP (NP (DT the) (JJ early) (NN train)) (PP (IN to) (NP (NNP New) (NNP York))))) (. .)))(TOP (S (NP (JJ Petite) (, ,) (JJ lovely) (NNP Yvette) (NNP Chadroe)) (VP (VBZ plays) (NP (DT the) (NN nymphomaniac)) (ADVP (RB engagingly))) (. .)))...

Page 97: NLP: a peek into a day of a computational linguist

Penn Treebank tagset:• top level: TOP• sentence: S, SBAR, SQ, SBARQ, SINV• fragment: FRAG• noun phrase: NP• verb phrase: VP• prepositional phrase: PP• adjectival phrase: ADJP• adverbial phrase: ADVP• compound conjunction: CONJP• wh-phrases: WHNP, WHPP, WHADJP, WHADVP• more: LST, PRT, INTJ, NAC, PRN, QP, RRC, UCP, X

97

Constituency Labels

Page 98: NLP: a peek into a day of a computational linguist

• Algorithms:– top-down– chart– bottom-up

• Features include:– grammar (a.k.a. transitions)– spans of nodes– labels– right/left/right and left context– split point, etc.

• Weights are trained on the treebank.98

Constituency Parsing

Page 99: NLP: a peek into a day of a computational linguist

99

Shift-reduce constituency parsing

• Data– queue: the words of the sentence– stack: partially completed trees

• Actions– shift: move the word from the queue onto the stack– reduce: add a new label on top of the first n constituents on

the stack

Page 101: NLP: a peek into a day of a computational linguist

101

Syntax: impossible casesMost cats and dogs with fleas live in the neighbourhood.

Page 102: NLP: a peek into a day of a computational linguist

102

Syntax: impossible casesMost cats and dogs with fleas live in the neighbourhood.

Page 103: NLP: a peek into a day of a computational linguist

103

Syntax: impossible casesWanted: a nurse for a baby about twenty years old.

Page 104: NLP: a peek into a day of a computational linguist

104

Syntax: impossible casesWanted: a nurse for a baby about twenty years old.

Page 105: NLP: a peek into a day of a computational linguist

105

Syntax: impossible casesI shot an elephant in my pajamas.

Page 106: NLP: a peek into a day of a computational linguist

106

Syntax: impossible casesI shot an elephant in my pajamas.

Page 107: NLP: a peek into a day of a computational linguist

107

Syntax: impossible casesI once saw a deer riding my bicycle.

Page 108: NLP: a peek into a day of a computational linguist

108

Syntax: impossible casesI once saw a deer riding my bicycle.

Page 109: NLP: a peek into a day of a computational linguist

109

Syntax: impossible casesI’m glad I’m a man, and so is Lola.

Page 110: NLP: a peek into a day of a computational linguist

110

Syntax: impossible casesI’m glad I’m a man, and so is Lola.

Page 111: NLP: a peek into a day of a computational linguist

Types:• constituency tree

– every token is a part of some phrase constituent (parent node)– includes terminal and non-terminal nodes– shows relations among the constituents

• dependency tree– for every token, there is one node– includes only terminal nodes– shows relations among words

111

Syntactic Trees (or Parse Trees)

Page 112: NLP: a peek into a day of a computational linguist

Universal dependencies:• subject: NSUBJ, NSUBJPASS, CSUBJ, CSUBJPASS• object: DATIVE, DOBJ, AGENT, OPRD• complement: ACOMP, CCOMP, XCOMP, PCOMP• auxiliary: AUX, AUXPASS• clausal modifier: ACL, ADVCL, RELCL• different modifier: ADVMOD, NPADVMOD, AMOD, COMPOUND, NEG, NUMMOD,

QUANTMOD• determiner: DET, PREDET• apposition: APPOS• coordinating conjunction and conjuct: CC, CONJ• prepositional modifier and its object: PREP, POBJ• more: POSS, CASE, DEP, EXPL, INTJ, MARK, PRECONJ, PRT, PUNCT, PARATAXIS

112

Dependency Relations

Page 113: NLP: a peek into a day of a computational linguist

113

Dependency TreeIf you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

Page 114: NLP: a peek into a day of a computational linguist

114

Dependency TreeIf you want to receive e-mails about my upcoming shows, then please give me money so I can buy a computer.

Page 115: NLP: a peek into a day of a computational linguist

• Graph-Based Parsing– find the highest score tree from a complete graph– slow, but performs better on long-distance dependencies– e.g., MSTParser

• Transition-Based Parsing– apply transition actions one by one– faster, but performs better on short-distance dependencies– e.g., MaltParser, the Stanford Parser, ZPAR

115

Algorithms

Page 116: NLP: a peek into a day of a computational linguist

116

Graph-Based Parsing

Page 117: NLP: a peek into a day of a computational linguist

• Data– queue: the words of the sentence– stack: partially completed trees

• Actions:– shift: move the word from the queue onto the stack– reduce: pop the stack, removing only its top item, as long as that

item has a head– right-arc: create a right dependency arc between the word on top of

the stack and the next token in the queue– left-arc: create a left dependency arc between the word on top of

the stack and the next token in the queue117

Transition-Based Parsing

Page 119: NLP: a peek into a day of a computational linguist

Features

119

Page 120: NLP: a peek into a day of a computational linguist

120

Implementationhttps://github.com/mariana-scorp/one-day-with-cling

Page 121: NLP: a peek into a day of a computational linguist

121

Conclusion

Page 122: NLP: a peek into a day of a computational linguist

122

Syntax: impossible casesWe eat pizza with anchovy.

Page 123: NLP: a peek into a day of a computational linguist

123

Syntax: impossible casesWe eat pizza with anchovy.

Page 124: NLP: a peek into a day of a computational linguist

124

Syntax: impossible casesНасильство твій макіяж не приховає!

Page 125: NLP: a peek into a day of a computational linguist

125

Syntax: impossible casesНасильство твій макіяж не приховає!

Page 126: NLP: a peek into a day of a computational linguist

6. Let’s build something: error correction

Page 127: NLP: a peek into a day of a computational linguist

We likes pizza with anchovy.Children like and cherishes her kindness and cooking skills.Some is watching the way she knits and loving it.Colorless green ideas sleeps furiously.Barry and Mary, whom I met at the New Year 's party, is just the

cutest people.There is two cats and a dog.

127

Subject-verb disagreement

Page 128: NLP: a peek into a day of a computational linguist

Text processing: tokenization, POS tagging, syntactic parsing, etc.

Detection: find a VBZ

Rules: if the verb has nsubj relation and the subject does not have a conjunct, we should correct it…

Correction: use a dictionary of transformations

128

Rule-based Toy Solution

Page 129: NLP: a peek into a day of a computational linguist

Text processing: tokenization, POS tagging, syntactic parsing, etc.

Detection: find a VBZ

Classifier + features: POS tag of the subject, does the subject have a conjunct...

Correction: use a dictionary of transformations

129

ML-based Toy Solution

Page 130: NLP: a peek into a day of a computational linguist

130

Implementationgithub.com/mariana-scorp/one-day-with-cling

Page 132: NLP: a peek into a day of a computational linguist

132