computational lexicology, morphology and syntaxdtrandabat/pages/courses/lex2019/lex06... · •...

96
Computational Lexicology, Morphology and Syntax Course 6 Diana Trandabăț Academic year: 2019-2020

Upload: others

Post on 24-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Computational Lexicology, Morphology and Syntax

Course 6

Diana Trandabăț

Academic year: 2019-2020

Page 2: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Word Classes

• Words that somehow ‘behave’ alike:

–Appear in similar contexts

–Perform similar functions in sentences

–Undergo similar transformations

• ~10 traditional word classes of parts of speech (POS)

–Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, numeral

Page 3: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Some Examples

• N noun chair, bandwidth, pacing

• V verb study, debate, munch

• ADJ adjective purple, tall, ridiculous

• ADV adverb unfortunately, slowly

• P preposition of, by, to

• PRO pronoun I, me, mine

• DET determiner the, a, that, those

Page 4: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Ambiguity in POS Tagging

• “Like” can be a verb or a preposition – I like/VBP candy.

– Time flies like/IN an arrow.

• “Around” can be a preposition, particle, or adverb

– I bought it at the shop around/IN the corner.

– I never got around/RP to getting a car.

– A new Prius costs around/RB $25K.

Page 5: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Closed vs. Open class of words

• Closed class categories are composed of a small, fixed set of grammatical words for a given language. – Prepositions: of, in, by, … – Auxiliary verbs: may, can, will, had, been, … – Pronouns: I, you, she, mine, his, them, … – Usually function words (short common words which play a

role in grammar)

• Open class categories have large number of words and new ones are easily invented. – English has 4 categories: Nouns, Verbs, Adjectives, Adverbs – Nouns: Googler – Verbs: Google – Adjectives: geeky

Page 6: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Open Class Words

• Nouns

–Proper nouns

• Columbia University, New York City, Arthi Ramachandran, Metropolitan Transit Center

• English (and Romanian, and many other languages) capitalizes these

• Many have abbreviations

–Common nouns

• All the rest

Page 7: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Count nouns vs. mass nouns • Count: Have plurals, countable: goat/goats, one goat, two

goats

• Mass: Not countable (fish, salt, communism) (?two fishes)

Invariable nouns (same form for singular and plural) • ochi, ardei, nume

Singularia tantum (nouns having only the form for the singular number)

• Dreptate, studenţime, aur, soare

Pluralia tantum (nouns having only the form for the plural number)

• Icre, ochelari, rechizite

Page 8: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

• Adjectives: identify properties or qualities of nouns

–Color, size, age, …

–Adjective ordering restrictions in English:

• Old blue book, not blue old book

– In Korean, adjectives are realized as verbs

• Adverbs: also modify things (verbs, adjectives, adverbs)

–The very happy man walked home extremely slowly yesterday.

Page 9: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

– Directional/locative adverbs (here, home, downhill)

– Degree adverbs (extremely, very, somewhat) – Manner adverbs (slowly, silkily, delicately) – Temporal adverbs (Monday, tomorrow)

• Verbs: – In most languages take morphological affixes

(eat/eats/eaten) – Represent actions (walk, ate), processes (provide,

see), and states (be, seem) – Many subclasses, e.g.

• eats/V eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, ...

• Reflect morphological form & syntactic function

Page 10: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

How Do We Assign Words to Open or Closed?

• Nouns denote people, places and things and can be preceded by articles. But…

My typing is very bad. *The Mary loves John.

• Verbs are used to refer to actions, processes, states –But some are closed class and some are open I will have emailed everyone by noon.

• Adverbs modify actions – Is Monday a temporal adverbial or a noun?

Page 11: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Closed Class Words

• Closed class words (Prep, Det, Pron, Conj, Aux, Part, Num) are generally easy to process, since we can enumerate them…. but – Is it a Particles or a Preposition? • George eats up his dinner/George eats his dinner up.

• George eats up the street/*George eats the street up.

–Articles come in 2 flavors: definite (the) and indefinite (a, an)

Page 12: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Choosing a POS Tagset

• To do POS tagging, first need to choose a set of tags

• Could pick very coarse (small) tagsets –N, V, Adj, Adv.

• Commonly used: Brown Corpus (Francis & Kucera ‘82), 1M words, 87 tags – more informative but more difficult to tag

• Most commonly used: Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words, 45 tags

Page 13: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

English Parts of Speech • Noun (person, place or thing)

– Singular (NN): dog, fork

– Plural (NNS): dogs, forks

– Proper (NNP, NNPS): John, Springfields

• Pronouns – Personal pronoun (PRP): I, you, he, she, it

– Wh-pronoun (WP): who, what

• Verb (actions and processes) – Base, infinitive (VB): eat

– Past tense (VBD): ate

– Gerund (VBG): eating

– Past participle (VBN): eaten

– Non 3rd person singular present tense (VBP): eat

– 3rd person singular present tense: (VBZ): eats

– Modal (MD): should, can

– To (TO): to (to eat)

Page 14: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

English Parts of Speech (cont.)

• Adjective (modify nouns) – Basic (JJ): red, tall – Comparative (JJR): redder, taller – Superlative (JJS): reddest, tallest

• Adverb (modify verbs) – Basic (RB): quickly – Comparative (RBR): quicker – Superlative (RBS): quickest

• Preposition (IN): on, in, by, to, with • Determiner:

– Basic (DT) a, an, the – WH-determiner (WDT): which, that

• Coordinating Conjunction (CC): and, but, or, • Particle (RP): off (took off), up (put up)

Page 15: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Penn Treebank Tagset

Page 16: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Defining POS Tagging

• The process of assigning a part-of-speech or lexical class marker to each word in a corpus:

the

koala

put

the

keys

on

the

table

WORDS

TAGS

N

V

P

DET

Page 17: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Applications for POS Tagging

• Speech synthesis pronunciation – Lead Lead

– INsult inSULT

– OBject obJECT

– OVERflow overFLOW

– DIScount disCOUNT

– CONtent conTENT

• Word Sense Disambiguation: e.g. Time flies like an arrow – Is flies an N or V?

• Word prediction in speech recognition – Possessive pronouns (my, your, her) are likely to be followed by

nouns

– Personal pronouns (I, you, he) are likely to be followed by verbs

• Machine Translation

Page 18: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Part of Speech (POS) Tagging

• Lowest level of syntactic analysis.

John saw the saw and decided to take it to the table.

NNP VBD DT NN CC VBD TO VB PRP IN DT NN

Page 19: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

POS Tagging Process

• Usually assume a separate initial tokenization process that separates and/or disambiguates punctuation, including detecting sentence boundaries.

• Degree of ambiguity in English (based on Brown corpus) – 11.5% of word types are ambiguous.

– 40% of word tokens are ambiguous.

• Average POS tagging disagreement amongst expert human judges for the Penn treebank was 3.5% – Based on correcting the output of an initial automated tagger,

which was deemed to be more accurate than tagging from scratch.

• Baseline: Picking the most frequent tag for each specific word type gives about 90% accuracy – 93.7% if use model for unknown words for Penn Treebank

tagset.

Page 20: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

POS Tagging Approaches • Rule-Based: Human crafted rules based on lexical and

other linguistic knowledge.

• Learning-Based: Trained on human annotated corpora like the Penn Treebank.

– Statistical models: Hidden Markov Model (HMM), Maximum Entropy Markov Model (MEMM), Conditional Random Field (CRF), Neural Networks(NN)

– Rule learning: Transformation Based Learning (TBL)

• Generally, learning-based approaches have been found to be more effective overall, taking into account the total amount of human expertise and effort involved.

Page 21: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Simple taggers

• Default tagger has one tag per word, and assigns it on the basis of dictionary lookup

– Tags may indicate ambiguity but not resolve it, e.g. NVB for noun-or-verb

• Words may be assigned different tags with associated probabilities

– Tagger will assign most probable tag unless

– there is some way to identify when a less probable tag is in fact correct

• Tag sequences may be defined by regular expressions, and assigned probabilities (including 0 for illegal sequences – negative rules)

Page 22: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Rule-based taggers

• Earliest type of tagging: two stages

• Stage 1: look up word in lexicon to give list of potential POSs

• Stage 2: Apply rules which certify or disallow tag sequences

• Rules originally handwritten; more recently Machine Learning methods can be used

Page 23: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Start with a POS Dictionary

• she: PRP

• promised: VBN,VBD

• to: TO

• back: VB, JJ, RB, NN

• the: DT

• bill: NN, VB

• Etc… for the ~100,000 words of English

Page 24: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Assign All Possible POS to Each Word

NN

RB

VBN JJ VB

PRP VBD TO VB DT NN

She promised to back the bill

Page 25: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Apply Rules Eliminating Some POS

E.g., Eliminate VBN if VBD is an option

when VBN|VBD follows “<start> PRP”

NN

RB

VBN JJ VB

PRP VBD TO VB DT NN

She promised to back the bill

Page 26: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Apply Rules Eliminating Some POS

E.g., Eliminate VBN if VBD is an option

when VBN|VBD follows “<start> PRP”

NN

RB

JJ VB

PRP VBD TO VB DT NN

She promised to back the bill

Page 27: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

EngCG ENGTWOL Tagger

• Richer dictionary includes morphological and syntactic features (e.g. subcategorization frames) as well as possible POS

• Uses two-level morphological analysis on input and returns all possible POS

• Apply negative constraints (> 3744) to rule out incorrect POS

Page 28: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sample ENGTWOL Dictionary

Page 29: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

ENGTWOL Tagging: Stage 1

• First Stage: Run words through FST morphological analyzer to get POS info from morph

• E.g.: Pavlov had shown that salivation … Pavlov PAVLOV N NOM SG PROPER had HAVE V PAST VFIN SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO SVO SV that THAT ADV THAT PRON DEM SG THAT DET CENTRAL DEM SG THAT CS salivation SALIVATION N NOM SG

Page 30: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

ENGTWOL Tagging: Stage 2

• Second Stage: Apply NEGATIVE constraints

• E.g., Adverbial that rule

–Eliminate all readings of that except the one in It isn’t that odd.

Given input: that

If (+1 A/ADV/QUANT) ; if next word is adj/adv/quantifier

(+2 SENT-LIM) ; followed by E-O-S

(NOT -1 SVOC/A) ; and the previous word is not a verb like consider which allows adjective complements (e.g. I consider that odd)

Then eliminate non-ADV tags Else eliminate ADV

Page 31: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

How do they work?

• Tagger must be “trained”

• Many different techniques, but typically …

• Small “training corpus” hand-tagged

• Tagging rules learned automatically

• Rules define most likely sequence of tags

• Rules based on

– Internal evidence (morphology)

– External evidence (context)

– Probabilities

Page 32: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

What probabilities do we have to learn?

(a) Individual word probabilities:

Probability that a given tag t is appropriate for a given word w

– Easy (in principle): learn from training corpus:

– Problem of “sparse data”:

• Add a small amount to each calculation, so we get no zeros

where V is size of vocabulary

)(

),()|(

wf

wtfwtP

run occurs 4800 times in the

training corpus: 3600 times as a

verb, 1200 times as a noun:

P(verb|run) = 0.75

Vwf

wtfwtP

)(

1),()|(

Page 33: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

(b) Tag sequence probability:

Probability that a given tag sequence t1,t2,…,tn is appropriate for a given word sequence w1,w2,…,wn

– P(t1,t2,…,tn | w1,w2,…,wn ) = ???

– Too hard to calculate for entire sequence:

P(t1,t2 ,t3 ,t4 ,...) = P(t2|t1 ) P(t3|t1,t2 ) P(t4|t1,t2 ,t3 ) …

– Subsequence is more tractable

– Sequence of 2 or 3 should be enough:

Bigram model: P(t1,t2) = P(t2|t1 )

Trigram model: P(t1,t2 ,t3) = P(t2|t1 ) P(t3|t2 )

N-gram model:

ni

iin ttPttP,1

11 )|(),...,(

Page 34: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

More complex taggers

• Bigram taggers assign tags on the basis of sequences of two words (usually assigning tag to wordn on the basis of wordn-1)

• An nth-order tagger assigns tags on the basis of sequences of n words

• As the value of n increases, so does the complexity of the statistical calculation involved in comparing probability combinations.

Page 35: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Transformation-based tagging

• Eric Brill (1993)

• Start from an initial tagging, and apply a series of transformations

• Transformations are learned as well, from the training data

• Captures the tagging data in much fewer parameters than stochastic models

• The transformations learned (often) have linguistic “reality”

Page 36: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Transformation-based tagging

• Three stages:

– Lexical look-up

– Lexical rule application for unknown words

– Contextual rule application to correct miss-tags

Page 37: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Transformation-based learning

• Change tag a to b when:

– Internal evidence (morphology)

– Contextual evidence

• One or more of the preceding/following words has a specific tag

• One or more of the preceding/following words is a specific word

• One or more of the preceding/following words has a certain form

• Order of rules is important

– Rules can change a correct tag into an incorrect tag, so another rule might correct that “mistake”

Page 38: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Transformation-based tagging: examples

• if a word is currently tagged NN, and has a suffix of length 1 which consists of the letter 's', change its tag to NNS

• if a word has a suffix of length 2 consisting of the letter sequence 'ly', change its tag to RB (regardless of the initial tag)

• change VBN to VBD if previous word is tagged as NN

• Change VBD to VBN if previous word is ‘by’

Page 39: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Transformation-based tagging:

example

Booth/NP killed/VBN Abraham/NP Lincoln/NP

Abraham/NP Lincoln/NP was/BEDZ shot/VBD by/BY Booth/NP

He/PPS witnessed/VBD Lincoln/NP killed/VBN by/BY Booth/NP

Example after lexical lookup

Booth/NP killed/VBD Abraham/NP Lincoln/NP

Abraham/NP Lincoln/NP was/BEDZ shot/VBN by/BY Booth/NP

He/PPS witnessed/VBD Lincoln/NP killed/VBN by/BY Booth/NP

Example after application of contextual rule ’vbd vbn NEXTWORD by’

Booth/NP killed/VBD Abraham/NP Lincoln/NP

Abraham/NP Lincoln/NP was/BEDZ shot/VBD by/BY Booth/NP

He/PPS witnessed/VBD Lincoln/NP killed/VBD by/BY Booth/NP

Example after application of contextual rule ’vbn vbd PREVTAG np’

Page 40: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Templates for TBL

Page 41: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sample TBL Rule Application

• Labels every word with its most-likely tag –E.g. race occurences in the Brown corpus: • P(NN|race) = .98 • P(VB|race)= .02 • is/VBZ expected/VBN to/TO race/NN tomorrow/NN

• Then TBL applies the following rule – “Change NN to VB when previous tag is TO”

… is/VBZ expected/VBN to/TO race/NN tomorrow/NN becomes … is/VBZ expected/VBN to/TO race/VB tomorrow/NN

Page 42: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

TBL Tagging Algorithm

• Step 1: Label every word with most likely tag (from dictionary)

• Step 2: Check every possible transformation & select one which most improves tag accuracy (cf Gold)

• Step 3: Re-tag corpus applying this rule, and add rule to end of rule set

• Repeat 2-3 until some stopping criterion is reached, e.g., X% correct with respect to training corpus

• RESULT: Ordered set of transformation rules to use on new data tagged only with most likely POS tags

Page 43: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

TBL Issues

• Problem: Could keep applying (new) transformations ad infinitum

• Problem: Rules are learned in ordered sequence

• Problem: Rules may interact

• But: Rules are compact and can be inspected by humans

Page 44: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Evaluating Tagging Approaches • For any NLP problem, we need to know how to evaluate

our solutions

• Possible Gold Standards -- ceiling:

– Annotated naturally occurring corpus

– Human task performance (96-97%)

• How well do humans agree?

• Kappa statistic: avg pairwise agreement corrected for chance agreement

– Can be hard to obtain for some tasks: sometimes humans don’t agree

Page 45: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

• Baseline: how well does simple method do?

– For tagging, most common tag for each word (90%)

– How much improvement do we get over baseline?

Page 46: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Methodology: Error Analysis

• Confusion matrix:

– E.g. which tags did we most often confuse with which other tags?

– How much of the overall error does each confusion account for?

VB TO NN

VB

TO

NN

Page 47: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

More Complex Issues

• Tag indeterminacy: when ‘truth’ isn’t clear Caribbean cooking, child seat

• Tagging multipart words wouldn’t --> would/MD n’t/RB

• How to handle unknown words – Assume all tags equally likely

– Assume same tag distribution as all other singletons in corpus

– Use morphology, word length,….

Page 48: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Classification Learning

• Typical machine learning addresses the problem of classifying a feature-vector description into a fixed number of classes.

• There are many standard learning methods for this task:

– Decision Trees and Rule Learning

– Naïve Bayes and Bayesian Networks

– Logistic Regression / Maximum Entropy (MaxEnt)

– Perceptron and Neural Networks

– Support Vector Machines (SVMs)

– Nearest-Neighbor / Instance-Based

Page 49: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Beyond Classification Learning

• Standard classification problem assumes individual cases are disconnected and independent (i.i.d.: independently and identically distributed).

• Many NLP problems do not satisfy this assumption and involve making many connected decisions, each resolving a different ambiguity, but which are mutually dependent.

• More sophisticated learning and inference techniques are needed to handle such situations in general.

Page 50: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling Problem • Many NLP problems can viewed as sequence labeling.

• Each token in a sequence is assigned a label.

• Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors (not i.i.d).

foo bar blam zonk zonk bar blam

Page 51: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Information Extraction • Identify phrases in language that refer to specific types of

entities and relations in text.

• Named entity recognition is task of identifying names of people, places, organizations, etc. in text.

people organizations places

– Michael Dell is the CEO of Dell Computer Corporation and lives in Austin Texas.

• Extract pieces of information relevant to a specific application, e.g. used car ads:

make model year mileage price

– For sale, 2002 Toyota Prius, 20,000 mi, $15K or best offer. Available starting July 30, 2006.

Page 52: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Semantic Role Labeling

• For each clause, determine the semantic role played by each noun phrase that is an argument to the verb.

agent patient source destination instrument

– John drove Mary from Austin to Dallas in his Toyota Prius.

– The hammer broke the window.

• Also referred to a “case role analysis,” “thematic analysis,” and “shallow semantic parsing”

Page 53: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Bioinformatics

• Sequence labeling also valuable in labeling genetic sequences in genome analysis.

extron intron

– AGCTAACGTTCGATACGGATTACAGCCT

Page 54: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NNP

Page 55: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Page 56: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Page 57: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

Page 58: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

CC

Page 59: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Page 60: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

TO

Page 61: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VB

Page 62: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

PRP

Page 63: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

IN

Page 64: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Page 65: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification

• Classify each token independently but use as input features, information about the surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

Page 66: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Sequence Labeling as Classification Using Outputs as Inputs

• Better input features are usually the categories of the surrounding tokens, but these are not available yet.

• Can use category of either the preceding or succeeding tokens by going forward or back and using previous output.

Page 67: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

John saw the saw and decided to take it to the table.

classifier

NNP

Page 68: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

VBD

NNP

John saw the saw and decided to take it to the table.

Page 69: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

DT

NNP VBD

John saw the saw and decided to take it to the table.

Page 70: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

NN

NNP VBD DT

John saw the saw and decided to take it to the table.

Page 71: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

CC

NNP VBD DT NN

John saw the saw and decided to take it to the table.

Page 72: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

VBD

NNP VBD DT NN CC

John saw the saw and decided to take it to the table.

Page 73: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

TO

NNP VBD DT NN CC VBD

John saw the saw and decided to take it to the table.

Page 74: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

VB

NNP VBD DT NN CC VBD TO

John saw the saw and decided to take it to the table.

Page 75: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

PRP

NNP VBD DT NN CC VBD TO VB

John saw the saw and decided to take it to the table.

Page 76: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

IN

NNP VBD DT NN CC VBD TO VB PRP

John saw the saw and decided to take it to the table.

Page 77: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

classifier

DT

NNP VBD DT NN CC VBD TO VB PRP IN

John saw the saw and decided to take it to the table.

Page 78: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward Classification

NNP VBD DT NN CC VBD TO VB PRP IN DT

John saw the saw and decided to take it to the table.

classifier

NN

Page 79: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

John saw the saw and decided to take it to the table.

classifier

NN

Page 80: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

NN

John saw the saw and decided to take it to the table.

classifier

DT

Page 81: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

DT NN

John saw the saw and decided to take it to the table.

classifier

IN

Page 82: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

IN DT NN

John saw the saw and decided to take it to the table.

classifier

PRP

Page 83: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

VB

Page 84: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

TO

Page 85: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

VBD

Page 86: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

VBD TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

CC

Page 87: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

CC VBD TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

NN

Page 88: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

VBD CC VBD TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

DT

Page 89: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

DT VBD CC VBD TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

VBD

Page 90: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Backward Classification

• Disambiguating “to” in this case would be even easier backward.

VBD DT VBD CC VBD TO VB PRP IN DT NN

John saw the saw and decided to take it to the table.

classifier

NNP

Page 91: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Problems with Sequence Labeling as Classification

• Not easy to integrate information from category of tokens on both sides.

• Difficult to propagate uncertainty between decisions and “collectively” determine the most likely joint assignment of categories to all of the tokens in a sequence.

Page 92: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Stochastic taggers

• Nowadays, pretty much all taggers are statistics-based and have been since 1980s (or even earlier ... Some primitive algorithms were already published in 60s and 70s)

• Commonly based on Hidden Markov Models (also found in speech processing, etc.)

Page 93: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

(Hidden) Markov Models

• Probability calculations imply Markov models: we assume that P(t|w) is dependent only on the (or, a sequence of) previous word(s)

• (Informally) Markov models are the class of probabilistic models that assume we can predict the future without taking into account too much of the past

• Markov chains can be modelled by finite state automata: the next state in a Markov chain is always dependent on some finite history of previous states

• Model is “hidden” if it is actually a succession of Markov models, whose intermediate states are of no interest

Page 94: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Supervised vs unsupervised training

• Learning tagging rules from a marked-up corpus (supervised learning) gives very good results (98% accuracy)

– Though assigning most probable tag, and “proper noun” to unknowns will give 90%

• But it depends on having a corpus already marked up to a high quality

• If this is not available, we have to try something else:

– “forward-backward” algorithm

– A kind of “bootstrapping” approach

Page 95: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Forward-backward (Baum-Welch) algorithm

• Start with initial probabilities

– If nothing known, assume all Ps equal

• Adjust the individual probabilities so as to increase the overall probability.

• Re-estimate the probabilities on the basis of the last iteration

• Continue until convergence

– i.e. there is no improvement, or improvement is below a threshold

• All this can be done automatically

Page 96: Computational Lexicology, Morphology and Syntaxdtrandabat/pages/courses/lex2019/LEX06... · • Degree of ambiguity in English (based on Brown corpus) –11.5% of word types are ambiguous

Conclusions • POS Tagging is the lowest level of syntactic analysis.

• It is an instance of sequence labeling, a collective classification task that also has applications in information extraction, phrase chunking, semantic role labeling, and bioinformatics.