algorithms for nlp - cs.cmu.edutbergkir/11711fa17/fa17 11-711 lecture 10 -- parsing ii.pdf ·...

157
Tagging / Parsing Taylor Berg-Kirkpatrick – CMU Slides: Dan Klein – UC Berkeley Algorithms for NLP

Upload: others

Post on 13-Sep-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Tagging/ParsingTaylorBerg-Kirkpatrick– CMU

Slides:DanKlein– UCBerkeley

AlgorithmsforNLP

Page 2: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

SoHowWellDoesItWork?§ Choosethemostcommontag

§ 90.3%withabadunknownwordmodel§ 93.7%withagoodone

§ TnT (Brants,2000):§ Acarefullysmoothedtrigramtagger§ Suffixtreesforemissions§ 96.7%onWSJtext(SOTAis97+%)

§ Noiseinthedata§ Manyerrorsinthetrainingandtestcorpora

§ Probablyabout2%guaranteederrorfromnoise(onthisdata)

NN NN NNchief executive officer

JJ NN NNchief executive officer

JJ JJ NNchief executive officer

NN JJ NNchief executive officer

DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …

Page 3: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Overview:Accuracies§ Roadmapof(known/unknown)accuracies:

§ Mostfreq tag: ~90%/~50%

§ TrigramHMM: ~95%/~55%

§ TnT (HMM++): 96.2%/86.0%

§ Maxent P(t|w): 93.7%/82.6%§ MEMMtagger: 96.9%/86.9%§ State-of-the-art: 97+%/89+%§ Upperbound: ~98%

Most errors on unknown

words

Page 4: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

CommonErrors§ Commonerrors[fromToutanova&Manning00]

NN/JJ NN

official knowledge

VBD RP/IN DT NN

made up the story

RB VBD/VBN NNS

recently sold shares

Page 5: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

RicherFeatures

Page 6: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

BetterFeatures§ Candosurprisinglywelljustlookingatawordbyitself:

§ Word the:the® DT§ Lowercasedword Importantly:importantly® RB§ Prefixes unfathomable:un-® JJ§ Suffixes Surprisingly:-ly® RB§ Capitalization Meridian:CAP® NNP§ Wordshapes 35-year:d-x® JJ

§ Thenbuildamaxent(orwhatever)modeltopredicttag§ MaxentP(t|w): 93.7%/82.6% s3

w3

Page 7: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

WhyLocalContextisUseful§ Lotsofrichlocalinformation!

§ Wecouldfixthiswithafeaturethatlookedatthenextword

§ Wecouldfixthisbylinkingcapitalizedwordstotheirlowercaseversions

§ Solution:discriminativesequencemodels(MEMMs,CRFs)

§ Realitycheck:§ Taggersarealreadyprettygoodonnewswiretext…§ Whattheworldneedsistaggersthatworkonothertext!

PRP VBD IN RB IN PRP VBD .They left as soon as he arrived .

NNP NNS VBD VBN .Intrinsic flaws remained undetected .

RB

JJ

Page 8: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Sequence-FreeTagging?

§ Whataboutlookingatawordanditsenvironment,butnosequenceinformation?

§ Addinprevious/nextword the__§ Previous/nextwordshapes X__X§ Crudeentitydetection __…..(Inc.|Co.)§ Phrasalverbinsentence? put……__§ Conjunctionsofthesethings

§ Allfeaturesexceptsequence:96.6%/86.8%§ Useslotsoffeatures:>200K

t3

w3 w4w2

Page 9: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

NamedEntityRecognition§ Othersequencetasksusesimilarmodels

§ Example:nameentityrecognition(NER)

Prev Cur NextState ??? LOC ???Word at Grace RoadTag IN NNP NNPSig x Xx Xx

Local Context

Tim Boon has signed a contract extension with Leicestershire which will keep him at Grace Road .

PER PER O O O O O O ORG O O O O O LOC LOC O

Page 10: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

MEMMTaggers§ Idea:left-to-rightlocaldecisions,conditiononprevioustags

andalsoentireinput

§ TrainupP(ti|w,ti-1,ti-2)asanormalmaxent model,thenusetoscoresequences

§ ThisisreferredtoasanMEMMtagger[Ratnaparkhi 96]§ Beamsearcheffective!(Why?)§ Whataboutbeamsize1?

§ Subtleissueswithlocalnormalization(cf.Laffertyetal01)

Page 11: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

NERFeatures

Feature Type Feature PERS LOCPrevious word at -0.73 0.94Current word Grace 0.03 0.00First char of word G 0.45 -0.04Current POS tag NNP 0.47 0.45Prev and cur tags IN NNP -0.10 0.14Previous state Other -0.70 -0.92Current signature Xx 0.80 0.46Prev state, cur sig O-Xx 0.68 0.37Prev-cur-next sig x-Xx-Xx -0.69 0.37P. state - p-cur sig O-x-Xx -0.20 0.82…Total: -0.58 2.68

Prev Cur NextState Other LOC ???Word at Grace RoadTag IN NNP NNPSig x Xx Xx

Local Context

Feature WeightsBecause of regularization term, the more common prefixes have larger weights even though entire-word features are more specific.

Page 12: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Decoding§ DecodingMEMMtaggers:

§ JustlikedecodingHMMs,differentlocalscores§ Viterbi,beamsearch,posteriordecoding

§ Viterbialgorithm(HMMs):

§ Viterbialgorithm(MEMMs):

§ General:

Page 13: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ConditionalRandomFields(andFriends)

Page 14: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

MaximumEntropyII

§ Remember:maximumentropyobjective

§ Problem:lotsoffeaturesallowperfectfittotrainingset§ Regularization(comparetosmoothing)

Page 15: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

DerivativeforMaximumEntropy

Big weights are bad

Total count of feature n in correct candidates

Expected count of feature n in predicted

candidates

Page 16: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PerceptronReview

Page 17: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Perceptron§ Linearmodel:

§ …thatdecomposealongthesequence

§ …allowustopredictwiththeViterbialgorithm

§ …whichmeanswecantrainwiththeperceptronalgorithm(orrelatedupdates,likeMIRA)

[Collins 01]

Page 18: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ConditionalRandomFields§ Makeamaxentmodeloverentiretaggings

§ MEMM

§ CRF

Page 19: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

CRFs§ Likeanymaxent model,derivativeis:

§ Soallweneedistobeabletocomputetheexpectationofeachfeature(forexamplethenumberoftimesthelabelpairDT-NNoccurs,orthenumberoftimesNN-interestoccurs)underthemodeldistribution

§ Criticalquantity:countsofposteriormarginals:

Page 20: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ComputingPosteriorMarginals§ Howmany(expected)timesiswordwtaggedwiths?

§ Howtocomputethatmarginal?^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 21: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

GlobalDiscriminativeTaggers

§ Newer,higher-powereddiscriminativesequencemodels§ CRFs(alsoperceptrons,M3Ns)§ Donotdecomposetrainingintoindependentlocalregions§ Canbedeathlyslowtotrain– requirerepeatedinferenceontraining

set§ DifferencestendnottobetooimportantforPOStagging§ Differencesmoresubstantialonothersequencetasks§ However:oneissueworthknowingaboutinlocalmodels

§ “Labelbias”andotherexplainingawayeffects§ MEMMtaggers’localscorescanbenearonewithouthavingboth

good“transitions”and“emissions”§ Thismeansthatoftenevidencedoesn’tflowproperly§ Whyisn’tthisabigdealforPOStagging?§ Also:indecoding,conditiononpredicted,notgold,histories

Page 22: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Transformation-BasedLearning

§ [Brill95]presentsatransformation-basedtagger§ Labelthetrainingsetwithmostfrequenttags

DTMDVBDVBD.Thecanwasrusted.

§ Addtransformationruleswhichreducetrainingmistakes

§ MD® NN:DT__§ VBD® VBN:VBD__.

§ Stopwhennotransformationsdosufficientgood§ Doesthisremindanyoneofanything?

§ Probablythemostwidelyusedtagger(esp.outsideNLP)§ …butdefinitelynotthemostaccurate:96.6%/82.0%

Page 23: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LearnedTransformations§ Whatgetslearned?[fromBrill95]

Page 24: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EngCGTagger

§ Englishconstraintgrammartagger§ [TapanainenandVoutilainen94]§ Somethingelseyoushouldknowabout§ Hand-writtenandknowledgedriven§ “Don’tguessifyouknow”(generalpoint

aboutmodelingmorestructure!)§ Tagsetdoesn’tmakeallofthehard

distinctionsasthestandardtagset(e.g.JJ/NN)

§ Theygetstellaraccuracies:99%ontheirtagset

§ Linguisticrepresentationmatters…§ …butit’seasiertowinwhenyoumakeup

therules

Page 25: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

DomainEffects§ Accuraciesdegradeoutsideofdomain

§ Uptotripleerrorrate§ Usuallymakethemosterrorsonthethingsyoucareaboutinthedomain(e.g.proteinnames)

§ Openquestions§ Howtoeffectivelyexploitunlabeleddatafromanewdomain(whatcouldwegain?)

§ Howtobestincorporatedomainlexicainaprincipledway(e.g.UMLSspecialistlexicon,ontologies)

Page 26: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnsupervisedTagging

Page 27: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnsupervisedTagging?§ AKApart-of-speechinduction§ Task:

§ Rawsentencesin§ Taggedsentencesout

§ Obviousthingtodo:§ Startwitha(mostly)uniformHMM§ RunEM§ Inspectresults

Page 28: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Process§ Alternatebetweenrecomputingdistributionsoverhiddenvariables(the

tags)andreestimatingparameters§ Crucialstep:wewanttotallyuphowmany(fractional)countsofeach

kindoftransitionandemissionwehaveundercurrentparams:

§ SamequantitiesweneededtotrainaCRF!

Page 29: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Quantities§ Totalpathvalues(correspondtoprobabilitieshere):

Page 30: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 31: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Process

§ Fromthesequantities,cancomputeexpectedtransitions:

§ Andemissions:

Page 32: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Merialdo:Setup§ Some(discouraging)experiments[Merialdo94]

§ Setup:§ Youknowthesetofallowabletagsforeachword§ Fixktrainingexamplestotheirtruelabels

§ LearnP(w|t)ontheseexamples§ LearnP(t|t-1,t-2)ontheseexamples

§ Onnexamples,re-estimatewithEM

§ Note:weknowallowedtagsbutnotfrequencies

Page 33: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Merialdo:Results

Page 34: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

DistributionalClustering

president the __ ofpresident the __ saidgovernor the __ ofgovernor the __ appointedsaid sources __ ¨said president __ thatreported sources __ ¨

presidentgovernor

saidreported

thea

¨ the president said that the downturn was over ¨

[Finch and Chater 92, Shuetze 93, many others]

Page 35: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

DistributionalClustering§ Threemainvariantsonthesameidea:

§ Pairwisesimilaritiesandheuristicclustering§ E.g.[FinchandChater92]§ Producesdendrograms

§ Vectorspacemethods§ E.g.[Shuetze93]§ Modelsofambiguity

§ Probabilisticmethods§ Variousformulations,e.g.[LeeandPereira99]

Page 36: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

NearestNeighbors

Page 37: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Dendrograms_

Page 38: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Dendrograms_

Page 39: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

VectorSpaceVersion§ [Shuetze93]clusterswordsaspointsinRn

§ Vectorstoosparse,useSVDtoreduce

Mw

context counts

US V

w

context counts

Cluster these 50-200 dim vectors instead.

Page 40: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Õ -=i

iiii ccPcwPCSP )|()|(),( 1

Õ +-=i

iiiiii cwwPcwPcPCSP )|,()|()(),( 11

AProbabilisticVersion?

¨ the president said that the downturn was over ¨

c1 c2 c6c5 c7c3 c4 c8

¨ the president said that the downturn was over ¨

c1 c2 c6c5 c7c3 c4 c8

Page 41: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

WhatElse?§ Variousnewerideas:

§ Contextdistributionalclustering[Clark00]§ Morphology-drivenmodels[Clark03]§ Contrastiveestimation[SmithandEisner05]§ Feature-richinduction[HaghighiandKlein06]

§ Also:§ Whataboutambiguouswords?§ Usingwidercontextsignatureshasbeenusedforlearningsynonyms(what’swrongwiththisapproach?)

§ Canextendtheseideasforgrammarinduction(later)

Page 42: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Computing Marginals

= sum of all paths through s at tsum of all paths

Page 43: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ForwardScores

Page 44: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Backward Scores

Page 45: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TotalScores

Page 46: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Syntax

Page 47: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ParseTrees

The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

Page 48: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PhraseStructureParsing§ Phrasestructureparsing

organizessyntaxintoconstituentsorbrackets

§ Ingeneral,thisinvolvesnestedtrees

§ Linguistscan,anddo,argueaboutdetails

§ Lotsofambiguity

§ Nottheonlykindofsyntax…

new art critics write reviews with computers

PP

NPNP

N’

NP

VP

S

Page 49: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ConstituencyTests

§ Howdoweknowwhatnodesgointhetree?

§ Classicconstituencytests:§ Substitutionbyproform§ Questionanswers§ Semanticgounds

§ Coherence§ Reference§ Idioms

§ Dislocation§ Conjunction

§ Cross-linguisticarguments,too

Page 50: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ConflictingTests§ Constituencyisn’talwaysclear

§ Unitsoftransfer:§ thinkabout~penser ৠtalkabout~hablar de

§ Phonologicalreduction:§ Iwillgo® I’llgo§ Iwanttogo® Iwanna go§ alecentre® aucentre

§ Coordination§ Hewenttoandcamefromthestore.

La vélocité des ondes sismiques

Page 51: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ClassicalNLP:Parsing

§ Writesymbolicorlogicalrules:

§ Usedeductionsystemstoproveparsesfromwords§ Minimalgrammaron“Fedraises”sentence:36parses§ Simple10-rulegrammar:592parses§ Real-sizegrammar:manymillionsofparses

§ Thisscaledverybadly,didn’tyieldbroad-coveragetools

Grammar (CFG) Lexicon

ROOT ® S

S ® NP VP

NP ® DT NN

NP ® NN NNS

NN ® interest

NNS ® raises

VBP ® interest

VBZ ® raises

NP ® NP PP

VP ® VBP NP

VP ® VBP NP PP

PP ® IN NP

Page 52: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Ambiguities

Page 53: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Ambiguities:PPAttachment

Page 54: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Attachments

§ Icleanedthedishesfromdinner

§ Icleanedthedisheswithdetergent

§ Icleanedthedishesinmypajamas

§ Icleanedthedishesinthesink

Page 55: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

SyntacticAmbiguitiesI

§ Prepositionalphrases:Theycookedthebeansinthepotonthestovewithhandles.

§ Particlevs.preposition:Thepuppytoreupthestaircase.

§ ComplementstructuresThetouristsobjectedtotheguidethattheycouldn’thear.Sheknowsyoulikethebackofherhand.

§ Gerundvs.participialadjectiveVisitingrelativescanbeboring.Changingschedulesfrequentlyconfusedpassengers.

Page 56: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

SyntacticAmbiguitiesII§ ModifierscopewithinNPs

impracticaldesignrequirementsplasticcupholder

§ MultiplegapconstructionsThechickenisreadytoeat.Thecontractorsarerichenoughtosue.

§ Coordinationscope:Smallratsandmicecansqueezeintoholesorcracksinthewall.

Page 57: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

DarkAmbiguities

§ Darkambiguities: mostanalysesareshockinglybad(meaning,theydon’thaveaninterpretationyoucangetyourmindaround)

§ Unknownwordsandnewusages§ Solution:Weneedmechanismstofocusattentiononthebestones,probabilistictechniquesdothis

Thisanalysiscorrespondstothecorrectparseof

“Thiswillpanicbuyers!”

Page 58: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AmbiguitiesasTrees

Page 59: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PCFGs

Page 60: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ProbabilisticContext-FreeGrammars

§ Acontext-freegrammarisatuple<N,T,S,R>§ N :thesetofnon-terminals

§ Phrasalcategories:S,NP,VP,ADJP,etc.§ Parts-of-speech(pre-terminals):NN,JJ,DT,VB

§ T :thesetofterminals(thewords)§ S :thestartsymbol

§ OftenwrittenasROOTorTOP§ Notusuallythesentencenon-terminalS

§ R :thesetofrules§ OftheformX® Y1 Y2 …Yk,withX,Yi Î N§ Examples:S® NPVP,VP® VPCCVP§ Alsocalledrewrites,productions,orlocaltrees

§ APCFGadds:§ Atop-downproductionprobabilityperruleP(Y1 Y2 …Yk|X)

Page 61: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TreebankSentences

Page 62: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TreebankGrammars

§ NeedaPCFGforbroadcoverageparsing.§ Cantakeagrammarrightoffthetrees(doesn’tworkwell):

§ Betterresultsbyenrichingthegrammar(e.g.,lexicalization).§ Canalsogetstate-of-the-artparserswithoutlexicalization.

ROOT ® S 1

S ® NP VP . 1

NP ® PRP 1

VP ® VBD ADJP 1

…..

Page 63: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PLURAL NOUN

NOUNDETDET

ADJ

NOUN

NP NP

CONJ

NP PP

TreebankGrammarScale

§ Treebankgrammarscanbeenormous§ AsFSAs,therawgrammarhas~10Kstates,excludingthelexicon§ Betterparsersusuallymakethegrammarslarger,notsmaller

NP

Page 64: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ChomskyNormalForm

§ Chomskynormalform:§ AllrulesoftheformX® YZorX® w§ Inprinciple,thisisnolimitationonthespaceof(P)CFGs

§ N-aryrulesintroducenewnon-terminals

§ Unaries/emptiesare“promoted”§ Inpracticeit’skindofapain:

§ Reconstructingn-ariesiseasy§ Reconstructingunariesistrickier§ Thestraightforwardtransformationsdon’tpreservetreescores

§ Makesparsingalgorithmssimpler!

VP

[VP ® VBD NP •]

VBD NP PP PP

[VP ® VBD NP PP •]

VBD NP PP PP

VP

Page 65: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

CKYParsing

Page 66: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ARecursiveParser

§ Willthisparserwork?§ Whyorwhynot?§ Memoryrequirements?

bestScore(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max score(X->YZ) *bestScore(Y,i,k) *bestScore(Z,k,j)

Page 67: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AMemoizedParser§ Onesmallchange:

bestScore(X,i,j,s)if (scores[X][i][j] == null)

if (j = i+1)score = tagScore(X,s[i])

elsescore = max score(X->YZ) *

bestScore(Y,i,k) *bestScore(Z,k,j)

scores[X][i][j] = scorereturn scores[X][i][j]

Page 68: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Canalsoorganizethingsbottom-up

ABottom-UpParser(CKY)

bestScore(s)for (i : [0,n-1])for (X : tags[s[i]])score[X][i][i+1] =

tagScore(X,s[i])for (diff : [2,n])for (i : [0,n-diff])j = i + difffor (X->YZ : rule)for (k : [i+1, j-1])score[X][i][j] = max score[X][i][j],

score(X->YZ) *score[Y][i][k] *score[Z][k][j]

Y Z

X

i k j

Page 69: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnaryRules§ Unaryrules?

bestScore(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max max score(X->YZ) *bestScore(Y,i,k) *bestScore(Z,k,j)

max score(X->Y) *bestScore(Y,i,j)

Page 70: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

CNF+UnaryClosure

§ Weneedunariestobenon-cyclic§ Canaddressbypre-calculatingtheunaryclosure§ Ratherthanhavingzeroormoreunaries,alwayshaveexactlyone

§ Alternateunaryandbinarylayers§ Reconstructunarychainsafterwards

NP

DT NN

VP

VBDNP

DT NN

VP

VBD NP

VP

S

SBAR

VP

SBAR

Page 71: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AlternatingLayers

bestScoreU(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max max score(X->Y) *bestScoreB(Y,i,j)

bestScoreB(X,i,j,s)return max max score(X->YZ) *

bestScoreU(Y,i,k) *bestScoreU(Z,k,j)

Page 72: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Analysis

Page 73: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Memory§ Howmuchmemorydoesthisrequire?

§ Havetostorethescorecache§ Cachesize:|symbols|*n2 doubles§ Fortheplaintreebankgrammar:

§ X~20K,n=40,double~8bytes=~256MB§ Big,butworkable.

§ Pruning:Beams§ score[X][i][j]cangettoolarge(when?)§ Cankeepbeams(truncatedmapsscore[i][j])whichonlystorethebestfew

scoresforthespan[i,j]

§ Pruning:Coarse-to-Fine§ UseasmallergrammartoruleoutmostX[i,j]§ Muchmoreonthislater…

Page 74: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Time:Theory§ Howmuchtimewillittaketoparse?

§ Foreachdiff(<=n)§ Foreachi (<=n)

§ ForeachruleX® YZ§ ForeachsplitpointkDoconstantwork

§ Totaltime:|rules|*n3

§ Somethinglike5secforanunoptimized parseofa20-wordsentence

Y Z

X

i k j

Page 75: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Time:Practice

§ Parsingwiththevanillatreebank grammar:

§ Why’sitworseinpractice?§ Longersentences“unlock”moreofthegrammar§ Allkindsofsystemsissuesdon’tscale

~ 20K Rules

(not an optimized parser!)

Observed exponent:

3.6

Page 76: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Same-SpanReachability

ADJP ADVPFRAG INTJ NPPP PRN QP SSBAR UCP VP

WHNP

TOP

LST

CONJP

WHADJP

WHADVP

WHPP

NX

NAC

SBARQ

SINV

RRCSQ X

PRT

Page 77: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

RuleStateReachability

§ Manystatesaremorelikelytomatchlargerspans!

Example: NP CC •

NP CC

0 nn-11 Alignment

Example: NP CC NP •

NP CC

0 nn-k-1n AlignmentsNP

n-k

Page 78: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EfficientCKY

§ LotsoftrickstomakeCKYefficient§ Someofthemarelittleengineeringdetails:

§ E.g.,firstchoosek,thenenumeratethroughtheY:[i,k]whicharenon-zero,thenloopthroughrulesbyleftchild.

§ Optimallayoutofthedynamicprogramdependsongrammar,input,evensystemdetails.

§ Anotherkindismoreimportant(andinteresting):§ ManyX[i,j]canbesuppressedonthebasisoftheinputstring§ We’llseethisnextclassasfigures-of-merit,A*heuristics,coarse-to-fine,etc

Page 79: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Agenda-BasedParsing

Page 80: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Agenda-BasedParsing§ Agenda-basedparsingislikegraphsearch(butovera

hypergraph)§ Concepts:

§ Numbering:wenumberfencepostsbetweenwords§ “Edges”oritems:spanswithlabels,e.g.PP[3,5],representthesetsof

treesoverthosewordsrootedatthatlabel(cf.searchstates)§ Achart:recordsedgeswe’veexpanded(cf.closedset)§ Anagenda:aqueuewhichholdsedges(cf.afringeoropenset)

0 1 2 3 4 5critics write reviews with computers

PP

Page 81: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

WordItems§ Buildinganitemforthefirsttimeiscalleddiscovery.Itemsgo

intotheagendaondiscovery.§ Toinitialize,wediscoverallworditems(withscore1.0).

critics write reviews with computers

critics[0,1], write[1,2], reviews[2,3], with[3,4], computers[4,5]

0 1 2 3 4 5

AGENDA

CHART [EMPTY]

Page 82: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnaryProjection§ Whenwepopaworditem,thelexicontellsusthetagitem

successors(andscores)whichgoontheagenda

critics write reviews with computers

0 1 2 3 4 5critics write reviews with computers

critics[0,1] write[1,2]NNS[0,1]

reviews[2,3] with[3,4] computers[4,5]VBP[1,2] NNS[2,3] IN[3,4] NNS[4,5]

Page 83: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ItemSuccessors§ Whenwepopitemsoffoftheagenda:

§ Graphsuccessors:unaryprojections(NNS® critics,NP® NNS)

§ Hypergraph successors:combinewithitemsalreadyinourchart

§ Enqueue /promoteresultingitems(ifnotinchartalready)§ Recordbacktraces asappropriate§ Stickthepoppededgeinthechart(closedset)

§ Queriesachartmustsupport:§ IsedgeX[i,j]inthechart?(Whatscore?)§ WhatedgeswithlabelYendatpositionj?§ WhatedgeswithlabelZstartatpositioni?

Y[i,j] with X ® Y forms X[i,j]

Y[i,j] and Z[j,k] with X ® Y Z form X[i,k]

Y Z

X

Page 84: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AnExample

0 1 2 3 4 5critics write reviews with computers

NNS VBP NNS IN NNS

NNS[0,1] VBP[1,2] NNS[2,3] IN[3,4] NNS[3,4] NP[0,1] NP[2,3] NP[4,5]

NP NP NP

VP[1,2] S[0,2]

VP

PP[3,5]

PP

VP[1,3]

VP

ROOT[0,2]

SROOT

SROOT

S[0,3] VP[1,5]

VP

NP[2,5]

NP

ROOT[0,3] S[0,5] ROOT[0,5]

S

ROOT

Page 85: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EmptyElements§ Sometimeswewanttopositnodesinaparsetreethatdon’t

containanypronouncedwords:

§ Theseareeasytoaddtoaagenda-basedparser!§ Foreachpositioni,addthe“word”edgee[i,i]§ AddruleslikeNP® e tothegrammar§ That’sit!

0 1 2 3 4 5I like to parse empties

e e e e e e

NP VP

I want you to parse this sentence

I want [ ] to parse this sentence

Page 86: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UCS/A*

§ Withweightededges,ordermatters§ Mustexpandoptimalparsefrom

bottomup(subparsesfirst)§ CKYdoesthisbyprocessingsmaller

spansbeforelargerones§ UCSpopsitemsofftheagendain

orderofdecreasingViterbiscore§ A*searchalsowelldefined

§ Youcanalsospeedupthesearchwithoutsacrificingoptimality§ Canselectwhichitemstoprocessfirst§ Candowithany“figureofmerit”

[Charniak98]§ Ifyourfigure-of-meritisavalidA*

heuristic,nolossofoptimiality[KleinandManning03]

X

n0 i j

Page 87: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

(Speech)Lattices§ Therewasnothingmagicalaboutwordsspanningexactly

oneposition.§ Whenworkingwithspeech,wegenerallydon’tknow

howmanywordsthereare,orwheretheybreak.§ Wecanrepresentthepossibilitiesasalatticeandparse

thesejustaseasily.

Iawe

of

van

eyes

sawa

‘ve

an

Ivan

Page 88: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LearningPCFGs

Page 89: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TreebankPCFGs§ UsePCFGsforbroadcoverageparsing§ Cantakeagrammarrightoffthetrees(doesn’tworkwell):

ROOT ® S 1

S ® NP VP . 1

NP ® PRP 1

VP ® VBD ADJP 1

…..

Model F1Baseline 72.0

[Charniak 96]

Page 90: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ConditionalIndependence?

§ NoteveryNPexpansioncanfilleveryNPslot§ Agrammarwithsymbolslike“NP”won’tbecontext-free§ Statistically,conditionalindependencetoostrong

Page 91: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Non-Independence§ Independenceassumptionsareoftentoostrong.

§ Example:theexpansionofanNPishighlydependentontheparentoftheNP(i.e.,subjectsvs.objects).

§ Also:thesubjectandobjectexpansionsarecorrelated!

11%9%

6%

NP PP DT NN PRP

9% 9%

21%

NP PP DT NN PRP

7%4%

23%

NP PP DT NN PRP

All NPs NPs under S NPs under VP

Page 92: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

GrammarRefinement

§ Example:PPattachment

Page 93: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

GrammarRefinement

§ StructureAnnotation[Johnson’98,Klein&Manning ’03]§ Lexicalization[Collins’99,Charniak ’00]§ LatentVariables[Matsuzaki etal.05,Petrov etal.’06]

Page 94: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

StructuralAnnotation

Page 95: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TheGameofDesigningaGrammar

§ Annotation refines base treebank symbols to improve statistical fit of the grammar§ Structural annotation

Page 96: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TypicalExperimentalSetup

§ Corpus:PennTreebank,WSJ

§ Accuracy– F1:harmonicmeanofper-nodelabeledprecisionandrecall.

§ Here:alsosize– numberofsymbolsingrammar.

Training: sections 02-21Development: section 22 (here, first 20 files)Test: section 23

Page 97: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

VerticalMarkovization

§ VerticalMarkovorder:rewritesdependonpastkancestornodes.(cf.parentannotation)

Order 1 Order 2

72%73%74%75%76%77%78%79%

1 2v 2 3v 3

Vertical Markov Order

05000

10000150002000025000

1 2v 2 3v 3

Vertical Markov Order

Symbols

Page 98: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

HorizontalMarkovization

70%

71%

72%

73%

74%

0 1 2v 2 inf

Horizontal Markov Order

0

3000

6000

9000

12000

0 1 2v 2 inf

Horizontal Markov Order

Symbols

Order 1 Order ¥

Page 99: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnarySplits

§ Problem:unaryrewritesusedtotransmutecategoriessoahigh-probabilityrulecanbeused.

Annotation F1 SizeBase 77.8 7.5KUNARY 78.3 8.0K

n Solution: Mark unary rewrite sites with -U

Page 100: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TagSplits

§ Problem:Treebanktagsaretoocoarse.

§ Example:Sentential,PP,andotherprepositionsareallmarkedIN.

§ PartialSolution:§ SubdividetheINtag. Annotation F1 Size

Previous 78.3 8.0KSPLIT-IN 80.3 8.1K

Page 101: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AFullyAnnotated(Unlex)Tree

Page 102: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

SomeTestSetResults

§ Beats“firstgeneration”lexicalizedparsers.§ Lotsofroomtoimprove– morecomplexmodelsnext.

Parser LP LR F1 CB 0 CB

Magerman 95 84.9 84.6 84.7 1.26 56.6

Collins 96 86.3 85.8 86.0 1.14 59.9

Unlexicalized 86.9 85.7 86.3 1.10 60.3

Charniak 97 87.4 87.5 87.4 1.00 62.1

Collins 99 88.7 88.6 88.6 0.90 67.1

Page 103: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EfficientParsingforStructuralAnnotation

Page 104: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

GrammarProjections

NP^S→DT^NPN’[…DT]^NPNP→DTN’

Coarse Grammar Fine Grammar

Note:X-BarGrammarsareprojectionswithruleslikeXP→YX’orXP→X’YorX’→X

Page 105: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Coarse-to-FinePruning

ForeachcoarsechartitemX[i,j],computeposteriorprobability:

… QP NP VP …coarse:

refined:

E.g.considerthespan5to12:

< threshold

Page 106: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Computing(Max-)Marginals

Page 107: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

InsideandOutsideScores

Page 108: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models
Page 109: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PruningwithA*

§ Youcanalsospeedupthesearchwithoutsacrificingoptimality

§ Foragenda-basedparsers:§ Canselectwhichitemstoprocessfirst

§ Candowithany“figureofmerit”[Charniak98]

§ Ifyourfigure-of-meritisavalidA*heuristic,nolossofoptimiality[KleinandManning03]

X

n0 i j

Page 110: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

A*Parsing

Page 111: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Lexicalization

Page 112: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Annotation refines base treebank symbols to improve statistical fit of the grammar§ Structural annotation [Johnson ’98, Klein and Manning 03]§ Head lexicalization [Collins ’99, Charniak ’00]

TheGameofDesigningaGrammar

Page 113: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ProblemswithPCFGs

§ Ifwedonoannotation,thesetreesdifferonlyinonerule:§ VP® VPPP§ NP® NPPP

§ Parsewillgoonewayortheother,regardlessofwords§ Weaddressedthisinonewaywithunlexicalizedgrammars(how?)§ Lexicalizationallowsustobesensitivetospecificwords

Page 114: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

ProblemswithPCFGs

§ What’sdifferentbetweenbasicPCFGscoreshere?§ What(lexical)correlationsneedtobescored?

Page 115: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LexicalizedTrees

§ Add“headwords”toeachphrasalnode§ Syntacticvs.semantic

heads§ Headshipnotin(most)

treebanks§ Usuallyuseheadrules,

e.g.:§ NP:

§ TakeleftmostNP§ TakerightmostN*§ TakerightmostJJ§ Takerightchild

§ VP:§ TakeleftmostVB*§ TakeleftmostVP§ Takeleftchild

Page 116: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LexicalizedPCFGs?§ Problem:wenowhavetoestimateprobabilitieslike

§ Nevergoingtogettheseatomicallyoffofatreebank

§ Solution:breakupderivationintosmallersteps

Page 117: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LexicalDerivationSteps§ Aderivationofalocaltree[Collins99]

Choose a head tag and word

Choose a complement bag

Generate children (incl. adjuncts)

Recursively derive children

Page 118: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LexicalizedCKY

bestScore(X,i,j,h)if (j = i+1)return tagScore(X,s[i])

elsereturn max max score(X[h]->Y[h] Z[h’]) *

bestScore(Y,i,k,h) *bestScore(Z,k,j,h’)

max score(X[h]->Y[h’] Z[h]) *bestScore(Y,i,k,h’) *bestScore(Z,k,j,h)

Y[h] Z[h’]

X[h]

i h k h’ j

k,h’,X->YZ

(VP->VBD •)[saw] NP[her]

(VP->VBD...NP •)[saw]

k,h’,X->YZ

Page 119: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EfficientParsingforLexicalGrammars

Page 120: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

QuarticParsing§ Turnsout,youcando(alittle)better[Eisner99]

§ GivesanO(n4)algorithm§ Stillprohibitiveinpracticeifnotpruned

Y[h] Z[h’]

X[h]

i h k h’ j

Y[h] Z

X[h]

i h k j

Page 121: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PruningwithBeams§ TheCollinsparserpruneswithper-

cellbeams[Collins99]§ Essentially,runtheO(n5) CKY§ Rememberonlyafewhypothesesfor

eachspan<i,j>.§ IfwekeepKhypothesesateachspan,

thenwedoatmostO(nK2)workperspan(why?)

§ Keepsthingsmoreorlesscubic(andinpracticeismorelikelinear!)

§ Also:certainspansareforbiddenentirelyonthebasisofpunctuation(crucialforspeed)

Y[h] Z[h’]

X[h]

i h k h’ j

Page 122: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

PruningwithaPCFG

§ TheCharniak parserprunesusingatwo-pass,coarse-to-fineapproach[Charniak 97+]§ First,parsewiththebasegrammar§ ForeachX:[i,j]calculateP(X|i,j,s)

§ Thisisn’ttrivial,andtherearecleverspeedups§ Second,dothefullO(n5) CKY

§ SkipanyX:[i,j]whichhadlow(say,<0.0001)posterior§ Avoidsalmostallworkinthesecondphase!

§ Charniak etal06:canusemorepasses§ Petrov etal07:canusemanymorepasses

Page 123: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Results

§ Someresults§ Collins99– 88.6F1(generativelexical)§ CharniakandJohnson05– 89.7/91.3F1(generativelexical/reranked)

§ Petrovetal06– 90.7F1(generativeunlexical)§ McCloskyetal06– 92.1F1(gen+rerank+self-train)

§ However§ Bilexicalcountsrarelymakeadifference(why?)§ Gildea01– Removingbilexicalcountscosts<0.5F1

Page 124: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LatentVariablePCFGs

Page 125: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]

TheGameofDesigningaGrammar

Page 126: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]

TheGameofDesigningaGrammar

Page 127: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Annotationrefinesbasetreebank symbolstoimprovestatisticalfitofthegrammar§ Parentannotation[Johnson’98]§ Headlexicalization [Collins’99,Charniak ’00]§ Automaticclustering?

TheGameofDesigningaGrammar

Page 128: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LatentVariableGrammars

Parse Tree Sentence Parameters

...

Derivations

Page 129: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Forward

LearningLatentAnnotations

EMalgorithm:

X1

X2X7X4

X5 X6X3

He was right

.

§ Brackets are known§ Base categories are known§ Only induce subcategories

JustlikeForward-BackwardforHMMs.Backward

Page 130: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

RefinementoftheDTtag

DT

DT-1 DT-2 DT-3 DT-4

Page 131: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Hierarchicalrefinement

Page 132: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

HierarchicalEstimationResults

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

Model F1Flat Training 87.3Hierarchical Training 88.4

Page 133: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Refinementofthe,tag§ Splittingallcategoriesequallyiswasteful:

Page 134: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Adaptive Splitting

§ Want to split complex categories more§ Idea: split everything, roll back splits which

were least useful

Page 135: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

AdaptiveSplittingResults

Model F1Previous 88.4With 50% Merging 89.5

Page 136: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

0

5

10

15

20

25

30

35

40

NP

VP PP

ADVP S

ADJP

SBAR Q

P

WH

NP

PRN

NX

SIN

V

PRT

WH

PP SQ

CO

NJP

FRAG

NAC UC

P

WH

ADVP INTJ

SBAR

Q

RR

C

WH

ADJP X

RO

OT

LST

NumberofPhrasalSubcategories

Page 137: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

NumberofLexicalSubcategories

0

10

20

30

40

50

60

70

NNP JJ

NNS NN VBN RB

VBG VB VBD CD IN

VBZ

VBP DT

NNPS CC JJ

RJJ

S :PR

PPR

P$ MD

RBR

WP

POS

PDT

WRB

-LRB

- .EX

WP$

WDT

-RRB

- ''FW RB

S TO$

UH, ``

SYM RP LS #

Page 138: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

LearnedSplits

§ Proper Nouns (NNP):

§ Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept.NNP-12 John Robert JamesNNP-2 J. E. L.NNP-1 Bush Noriega Peters

NNP-15 New San WallNNP-3 York Francisco Street

PRP-0 It He IPRP-1 it he theyPRP-2 it them him

Page 139: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

§ Relativeadverbs(RBR):

§ CardinalNumbers(CD):

RBR-0 further lower higherRBR-1 more less MoreRBR-2 earlier Earlier later

CD-7 one two ThreeCD-4 1989 1990 1988CD-11 million billion trillionCD-0 1 50 100CD-3 1 30 31CD-9 78 58 34

LearnedSplits

Page 140: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

FinalResults(Accuracy)

≤ 40 wordsF1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

Split / Merge 90.6 90.1

GER

Dubey ‘05 76.3 -

Split / Merge 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

Split / Merge 86.3 83.4

Still higher numbers from reranking / self-training methods

Page 141: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EfficientParsingforHierarchicalGrammars

Page 142: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Coarse-to-FineInference§ Example:PPattachment

?????????

Page 143: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

HierarchicalPruning

… QP NP VP …coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Page 144: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

BracketPosteriors

Page 145: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

1621min111min35min

15min(nosearcherror)

Page 146: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnsupervisedTagging

Page 147: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

UnsupervisedTagging?§ AKApart-of-speechinduction§ Task:

§ Rawsentencesin§ Taggedsentencesout

§ Obviousthingtodo:§ Startwitha(mostly)uniformHMM§ RunEM§ Inspectresults

Page 148: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Process§ Alternatebetweenrecomputingdistributionsoverhiddenvariables(the

tags)andreestimatingparameters§ Crucialstep:wewanttotallyuphowmany(fractional)countsofeach

kindoftransitionandemissionwehaveundercurrentparams:

§ SamequantitiesweneededtotrainaCRF!

Page 149: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Merialdo:Setup§ Some(discouraging)experiments[Merialdo94]

§ Setup:§ Youknowthesetofallowabletagsforeachword§ Fixktrainingexamplestotheirtruelabels

§ LearnP(w|t)ontheseexamples§ LearnP(t|t-1,t-2)ontheseexamples

§ Onnexamples,re-estimatewithEM

§ Note:weknowallowedtagsbutnotfrequencies

Page 150: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Quantities§ Totalpathvalues(correspondtoprobabilitieshere):

Page 151: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 152: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

EMforHMMs:Process

§ Fromthesequantities,cancomputeexpectedtransitions:

§ Andemissions:

Page 153: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Merialdo:Results

Page 154: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

Projection-BasedA*

Factory payrolls fell in Sept.

NP PP

VP

S

Factory payrolls fell in Sept.

payrolls in

fell

fellFactory payrolls fell in Sept.

NP:payrolls PP:in

VP:fell

S:fellSYNTACTICp SEMANTICp

Page 155: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

A*Speedup

§ TotaltimedominatedbycalculationofA*tablesineachprojection…O(n3)

0102030405060

0 5 10 15 20 25 30 35 40Length

Tim

e (s

ec) Combined Phase

Dependency PhasePCFG Phase

Page 156: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

BreakingUptheSymbols

§ WecanrelaxindependenceassumptionsbyencodingdependenciesintothePCFGsymbols:

§ Whatarethemostuseful“features”toencode?

Parent annotation[Johnson 98]

Marking possessive NPs

Page 157: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 10 -- parsing II.pdf · Global Discriminative Taggers §Newer, higher-powered discriminative sequence models

OtherTagSplits

§ UNARY-DT:markdemonstrativesasDT^U (“theX”vs.“those”)

§ UNARY-RB:markphrasaladverbsasRB^U(“quickly”vs.“very”)

§ TAG-PA:marktagswithnon-canonicalparents(“not”isanRB^VP)

§ SPLIT-AUX:markauxiliaryverbswith–AUX[cf.Charniak97]

§ SPLIT-CC:separate“but”and“&”fromotherconjunctions

§ SPLIT-%:“%”getsitsowntag.

F1 Size

80.4 8.1K

80.5 8.1K

81.2 8.5K

81.6 9.0K

81.7 9.1K

81.8 9.3K