ml4nlp - purdue university · ml4nlp introduction to syntax and parsing cs 590nlp dangoldwasser...

Post on 10-Sep-2019

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ML4NLPIntroduction to Syntax and Parsing

CS 590NLP

Dan GoldwasserPurdue University

dgoldwas@purdue.edu

“I shot an elephant in my pajamas”

“How he got into my pajamas,I'll never know.”

Groucho Marx

“I shot an elephant in my pajamas.”

“I shot an elephant in the zoo.”

Parsing

Language isnotjustastreamofwords,Wewanttorepresentlinguisticstructure!

• Twoviews:– ConstituencyParsing:Buildahierarchicalphrasestructure

– DependencyParsing:Showwordsdependencies(dependency=modifiers,orarguments)

Dependency and constituent parsing

ConstituencyParsing“thegoodolddays..”:Writeaprogram!

S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP

NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a

Can you parse: “Mary saw John with binoculars”?How about: “Mary saw a dog with binoculars”?

ConstituencyParsing

• “thegoodolddays..”:Writeaprogram!• Canyoutreatnaturalandformallanguages inthesameway?

ConstituencyParsing

“Fed raises interests rates 0.5% in effort to control inflation”

Howmanyparsingoptions?Millionofpossibleparsesinabroad-coverage grammar

This explains the popularity of statistical methods in NLP : millions of options, but only a few are likely!

CFG

• Formally:acontext-freegrammaris:• G=(T,N,S,R)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)

• AgrammarGgeneratesalanguageL

PCFG

• Formally:aprobabilistic context-freegrammar:• G=(T,N,S,R,P)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)– P:probabilityfunctionoverR

∀X ∈ N,!

X→Y ∈R

P (X → Y ) = 1

PCFGexample

S à NP VP 1.0NP à Det N 0.6NP à NP PP 0.4VP à V NP 0.6VP à VP PP 0.4PP à P NP 1.0

NP à John 0.3NP à Mary 0.3N à binoculars 0.2N à dog 0.2V à saw 0.2P à with 0.4Det à a 0.4…

P(Tree) – The probability of a tree is the product of the probabilities of the rules used to generate it.

ConstituencyParsingasStructuredLearning

• CanyoudefinePCFGasastructuredpredictionproblem?– Howwouldyoudefinethepredictionproblem?–Whatarethedependenciesinthemodel?–Whataretheparametersyouneedtolearn?–Whataregoodfeatures?

CKYAlgorithm

• Dynamicprogrammingalgorithmforparsing• GivenaCFGGandastringw,determinecanGparsew?

• WeassumeGisaCNF:– Eachrulehasatmost2symbolsontherightAàBCorAà B,Aà a

• ThealgorithmmaintainsatriangularDPtable.– Bottomrow:parsestringsofsize1– Secondbottomrow:parsestringsofsize2..– Toprow:parsetheentiresentence!

DPTriangularTable

X1,5X1,4 X2,5X1,3 X2, 4 X3,5X1,2 X2,3 X3, 4 X4,5X1,1 X2,2 X3,3 X4,4 X5,5w1 w2 w3 w4 w5

Tableforstring‘w’thathaslength5

ConstructingTheTriangularTable

{B} {A,C} {A,C} {B} {A,C}b a a b a

Ateachpointconsiderpossible rules,andtheirprobabilities

Sà AB|BCA à BA|aBà CC|bCà AB|a

CYKAlgorithm

• SimilartoViterbi,keepbackpointers toreconstructtheparsetreefromthetable

• Theruleactivationscoringfunctiondependsonthedependencyassumptions– Lookattheprobabilityofpreviousrowactivation,andconsidertheconditionalprobabilityoftherulegivenpreviousparses.

• OverallComplexity:O(n3 G)

Option2:dependencyparsing

• Keyidea:syntacticstructurerepresentedasrelationsbetweenlexicalitems,calleddependencies

Dependencies can be represented as a graph, where the nodes are words, and edges are dependencies,which are: (1) directional (2) often typed

Mainverb Obj

Subj Det

Root Mary ate a banana

Non-ProjectiveStructure

Root Mary ate a banana today that was yellow

Projectivestructure:nocrossingedges

Arethosereallyneeded?

However,wewilloftenassumenon-projectivity.- Itmakeslifeeasier- Itdoesn’toccuroften

DependencyParsing

• Weneedtoanswertwoquestions–

– Howcanyoumakeparsingdecisions?(i.e.,inference)

– Howdoyoulearntheparameterstoscorethesedecisions?

DependencyParsing

• Parser:foreachword,choosewhichotherworditdependson.– Youcanchoosetolabelthesedependencies

• Constraints:– Onlyoneroot– Nocycles

èEssentially, force a tree structure• AdditionalConstraints:nocrossingdependencies

ParsingApproaches

• Twocompetingapproaches–

• ExactInference:mostlygraphbasedalgorithms(e.g.,spanningtree)butalsoILP

• Approximateinference:lineartimetransitionparser

• Transitionbasedparserareverypopular!

GreedyTransition-basedDependencyParsing(Nivre’03)

• Parseroperatesbymaintainingtwodatastructures:– StackandBuffer

• Parsingisdoneviaasequenceofoperations.– Pushingthewordsfromthebuffertothestack,andassociatingdependencyedgesoverwordsinthestack.

– Shift:takeawordfromthetopofthebuffer,andputitontopofthestack.

– Left/RightArc:Dependencyoperationsassociatedependenciesbetweenwordsinthestack,andremovethedependentwordfromthestack.

• Parsingsequenceendswhenthestackandbufferareempty

[Root] Ilikelettuce

Stack Buffer

[Root]I likelettuce

[Root]Ilike lettuce

[Root]like lettuce

shift

shift

Left Arc

[Root]likelettuce

Shift

[Root]like

Right Arc

[Root]

Right Arc

LearningforDependencyParsing

• Learningatransitionparser:usedatatobuildascoringfunctionforparseroperations.– Thisshouldsoundfamiliar..

• Breakthedataintoasequenceofdecisions,andtraina”next-state”function.– Locallearning,(greedy)inferenceonlyattesttime.

• Traditionally:SVM,LR,..– Essentiallyamulticlassclassifieroverthecurrentstateoftheparser.

LearningforDependencyParsing

• Whichfeatureswouldyouconsider?

DeepLearningforDependencyParsing(Chen,Manning’14)

FromSyntaxtoSemantics

• Thesyntacticstructureofthesentencecapturessomesemanticproperties(e.g.,recallthePPattachmentproblem).

• However,itdoesnotaccountformeaninginabroadsense.

• Interestingquestion:Whatisacomputationalmodelformeaning?

Whatisthemeaningofmeaning?

Semantic

• Wedistinguishbetween:

• Lexicalsemantics:meaningofwords

• Compositionalsemantics:Combineindividualunitstoformthemeaningoflargerunits.

Applications

• Semanticsiswhatwereallycareabout:– Questionanswering– Intelligentinformationaccess– Robotcommunication– Summarization– …

Deepvs.ShallowSemantics

• Surprisingly,wetendtobelievethatdogsunderstandmuchmore!

• Similarly– shallowNLPperformssurprisinglywell!

Semantics

• Wewilllookattwosemanticsproblems:

– FormalSemanticRepresentation:findamathematical representationofmeaning

– InformationExtraction:“machinereading”view-populateaDBoffactsfromtext.

FormalModelsofMeaning

FormalModelsofMeaning

• Formalmodelforcompositionalsemantics:– Formthesemanticsofparents,basedonthesemanticsofthechildren

• Weassumeadictionaryofitems:– Constant symbols– Functions

NP VPJohn Smokes

S

Smokes(John)

ConstantsandFunctions

• Constants– PurdueUniversity– BarakObama

• Properties:– Red(x),Small(x),..

• Relations:– Love(x,y),PresidentOf(BarakObama,USA)

Generatingameaningrepresentation

• Weassumethatsyntacticrepresentationsandcompositionalsemanticsarehighlydependent

• Simplealgorithm:– Createaparsetree– Findsemanticrepresentationofwords(leafnodes)– Combinesemanticsofchildrenintoparentnode(bottomup)

SemanticParser

• Keyidea:augmentsyntacticparsingwithmeaning!

S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP

NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a

SemanticParser

• Keyidea:augmentsyntacticparsingwithmeaning!

S [SM] à NP [NPM] VP [VPM], Apply(SM, NPM,VPM)VP [VPM] à V [VM] NP [NPM], Apply(VPM, VM,NPM)NP [NPM] à N [NM], Apply(NP, N)..

N [john]à JohnN [mary]à MaryV [λx.saw(x)] à saw

Thisissometimescalledalexicon

Generatingameaningrepresentation

LearningforSemanticParsing

• Similartosyntacticparsing,therearemanypossiblemeaningderivationforasinglesentence– Eachcouldresultinadifferentsemanticrepresentation!

• Tohelpusdisambiguatethemeaningofasentence,wecandefineprobabilisticparser:

N [john]à John 0.4N [mary]à Mary 0.2V [λx.saw(x)] à saw 0.9

GroundedLanguageInterpretation• Compositionality:constructingmeaningbycomposing

themeaningoflowerlevelunits.– Lowestlevel(“leaves”)aretypicallyconstantsymbols

• Wheredothesymbolscomefrom?• Weassumeaworldmodel,providingtherelevantsetof

symbols– Peopleonyoursmartphone,transactionsinaDB,entitieson

wikipedia,realworldobjects(”pickupthatblock”)• Analyzingthedifficultyofsemanticinterpretation:

– Complexityoftheinputlanguage,complexityofthesetofsymbols,complexityoftheirmapping

GroundedLanguageInterpretation• ”pickupthegreenpiece”• ”pickupthegreenpiecethat’snexttothebluepiece”• ”pickupthegreenpiecethat’sshapedlikealettuce”• “pickupthegreenpiecethat’sattheleftendofthebottomrow.

AJointModelofLanguageandPerceptionforGroundedAttributeLearning.Matuszeket-al.2012

GroundedLanguageInterpretation

• Createameaningrepresentationcapturingthemappingfromlanguagetopreceptsintherealworld

The(probabilityof)truthvalueofthesepredictsdependsonrealworldgrounding

GroundedLanguageInterpretation

• Createascoringfunction,connectingthetworepresentations

NaturalLanguageCommunicationwithRobots.Bisket-al.2016

Worldrepresentation

GroundedLanguageInterpretation

• Twocompetingapproaches.– Createanexplicitmeaningrepresentation– Createascoringfunctionthatranksmeaningrepresentations(ortheiroutcomes)

• Wediscusseditinthecontextofgroundedrepresentations(“realworldobjects”)– Similardiscussionfordifferentsettings(e.g.,DBaccess).

• Whichonewillbeeasiertolearn?Whatkindofsupervisioneffortisneededineither?

Scalingup

Voters go to the polls in four states on Tuesday, with Michigan the biggest prize for both parties. Donald J. Trump seeks to strengthen his position as the Republican front-runner, while his rivals look to slow his drive toward the nomination. For the Democrats, Senator Bernie Sanders of Vermont faces a crucial test in his upstart campaign to derail Hillary Clinton.Here are some of the things we will be watching in the contests in Hawaii, Idaho, Michigan and Mississippi.

NYTimes article

MachineReading

• Morerealistictask:givenunstructuredtext,createstructuredknowledge

• SimpleExamples:– NamedEntityRecognition

• Morecomplicated:– Relationships betweenentities

IEExample

FortheDemocrats,SenatorBernieSandersofVermont facesacrucialtestinhisupstartcampaigntoderailHillaryClinton.HerearesomeofthethingswewillbewatchinginthecontestsinHawaii,Idaho,Michigan andMississippi.

BernieSandersis-ademocrat

BernieSandersis-fromVermont

RelationExtraction

• Wemakeadistinctionbetweenclosed andopen IE

• Closed:focusonasmallsetofrelations– Easytothinkaboutasasupervisedtask

• Open:findallrelations

RelationExtraction

• Populartask:ACE2003defined4types:– Role:member,owner,affiliate,client– Part:subsidiary,physicalpart-of,setmembership– At:location,based-in,residence– Social:parent,sibling,spouse

• Realisticsettings:Freebasehasthousandsofrelations!

BuildingRelationExtractors

• Simplepatternrecognition

Hearst(1992)

Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium,forlaboratoryorindustrialuse.

WhatdoesGelidium mean?

PatternbasedRelationExtraction

PatternbasedRelationExtraction

Bootstrapping• Simpleidea:– Givenasmallseedsetofrelations(e.,g byminingpatterns)

– AndALOTofunsupervisedtext– Findmentionsofrelation inthetext– Usementionstocomeupwithnewpatterns!

SupervisedRelationExtraction

• Givenasentence,findthelistofentities,andpredictifthereisarelation.

• Keyproblem:findingagoodfeaturerepresentation

Zhouetal.2005

ScalingupRE

• Keyproblem:realisticmachinereadingrequiresdealingwiththousandsofrelations.

• Directlyannotatingforthistaskisnotreasonable,howcanwescaleup?

• Keyidea: distantsupervision– SimilartoBootstrapping+Learning

DistantSupervision

• Assumewehaveacollectionofrelations– Easy!(e.g.,Freebase,Wikipedia,..)

• ..andthatiftwoentitiesappearinarelation,sentencescontainingthesetwoentitieswillexpressthisrelationship.

• Usesuchsentencesasnoisytrainingdata!

DistantSupervisionExample

top related