ccg parsing - cs.utexas.edu
TRANSCRIPT
CCGParsing
CCGParsing
Ze-lemoyerandCollins(2005)
‣ “What”isaverycomplextype:needsanounandneedsaS\NPtoformasentence.S\NPisbasicallyaverbphrase(borderTexas)
CCGParsing
Ze-lemoyerandCollins(2005)
‣ “What”isaverycomplextype:needsanounandneedsaS\NPtoformasentence.S\NPisbasicallyaverbphrase(borderTexas)
‣Whatinthiscaseknowsthattherearetwopredicates(statesandborderTexas).Thisisnotageneralthing
CCGParsing
‣ ThesequesMonarecomposi2onal:wecanbuildbiggeronesoutofsmallerpieces
WhatstatesborderTexas?
WhatstatesborderstatesborderingTexas?
WhatstatesborderstatesborderingstatesborderingTexas?
‣ Ingeneral,answeringthisdoesrequireparsingandnotjustslot-filling
CCGParsing
Slidecredit:DanKlein
‣ “to”needsanNP(desMnaMon)andN(parent)‣ “Showme”isano-op
CCGParsing
Ze-lemoyerandCollins(2005)
‣Manywaystobuildtheseparsers
‣ Oneapproach:runa“supertagger”(tagsthesentencewithcomplexlabels),thenruntheparser
‣ Parsingiseasyonceyouhavethetags,sowe’vereducedittoa(hard)taggingproblem
TrainingCCGParsers‣ Trainingdatalookslikepairsofsentencesandlogicalforms
WhatstatesborderTexas λx. state(x) ∧ borders(x, e89)
WhatbordersTexas λx. borders(x, e89)…
‣Whatcanwelearnfromthese?
‣ TexascorrespondstoNP|e89inthelogicalform(easytofigureout)
(S/(S\NP))/N|λf.λg.λx. f(x) ∧ g(x)‣Whatcorrespondsto
‣ Howdoweinferthatwithoutbeingtoldit?
‣ Problem:wedon’tknowthederivaMon
Lexicon
WhatstatesborderTexas λx. state(x) ∧ borders(x, e89)
‣ Anysubstringcanparsetoanyoftheseinthelexicon
‣ Chunksinferredfromthelogicformbasedonrules:
‣ GENLEX:takessentenceSandlogicalformL.BreakuplogicalformintochunksC(L),assumeanysubstringofSmightmaptoanychunk
‣ Texas->NP:e89iscorrect‣ borderTexas->NP:e89‣WhatstatesborderTexas->NP:e89… Ze-lemoyerandCollins(2005)
‣ NP:e89 ‣ (S\NP)/NP:λx.λy. borders(x,y)
Learning
Ze-lemoyerandCollins(2005)
‣ IteraMveprocedure:esMmate“best”parsesthatderiveeachlogicalform,retraintheparserusingtheseparseswithsupervisedlearning
‣ Unsupervisedlearningofcorrespondences,likewordalignment
‣ EventuallyweconvergeontherightparsesatthesameMmethatwelearnamodeltobuildthem
Seq2seqSemanMcParsing
SemanMcParsingasTranslaMon
JiaandLiang(2016)
‣WritedownalinearizedformofthesemanMcparse,trainseq2seqmodelstodirectlytranslateintothisrepresentaMon
‣Whatmightbesomeconcernsaboutthisapproach?HowdowemiMgatethem?
“whatstatesborderTexas”
lambda x ( state ( x ) and border ( x , e89 ) ) )
‣Whataresomebenefitsofthisapproachcomparedtogrammar-based?
HandlingInvariances
‣ Parsing-basedapproacheshandlethesethesameway
‣ Possibledivergences:features,differentweightsinthelexicon
‣ Keyidea:don’tchangethemodel,changethedata
“whatstatesborderTexas” “whatstatesborderOhio”
‣ Canwegetseq2seqsemanMcparserstohandlethesethesameway?
‣ “DataaugmentaMon”:encodeinvariancesbyautomaMcallygeneraMngnewtrainingexamples
DataAugmentaMon
‣ AbstractoutenMMes:nowwecan“remix”examplesandencodeinvariancetoenMtyID.Morecomplicatedremixestoo
‣ Letsussynthesizea“whatstatesborderohio?”example
JiaandLiang(2016)
SemanMcParsingasTranslaMon
JiaandLiang(2016)
‣ Prolog
‣ Lambdacalculus
‣ OtherDSLs
‣ Handleallofthesewithuniformmachinery!
SemanMcParsingasTranslaMon
JiaandLiang(2016)
‣ ThreeformsofdataaugmentaMonallhelp
‣ ResultsonthesetasksaresMllnotasstrongashand-tunedsystemsfrom10yearsago,butthesamesimplemodelcandowellatallproblems
ApplicaMons‣ GeoQuery(ZelleandMooney,1996):answeringquesMonsaboutstates(~80%accuracy)
‣ Jobs:answeringquesMonsaboutjobposMngs(~80%accuracy)
‣ ATIS:flightsearch
‣ Candowellonallofthesetasksifyouhandcraqsystemsanduseplentyoftrainingdata:thesedomainsaren’tthatrich
RegexPredicMon‣ CanuseforothersemanMcparsing-liketasks
‣ Predictregexfromtext
‣ Problem:requiresalotofdata:10,000examplesneededtoget~60%accuracyonpre-ysimpleregexes
Locascioetal.(2016)
SQLGeneraMon‣ ConvertnaturallanguagedescripMonintoaSQLqueryagainstsomeDB
‣ Howtoensurethatwell-formedSQLisgenerated?
Zhongetal.(2017)
‣ Threeseq2seqmodels
‣ Howtocapturecolumnnames+constants?‣ Pointermechanisms