compositional captioning - university of...
TRANSCRIPT
![Page 1: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/1.jpg)
CompositionalCaptioning:DescribingNovelObjectCategories
withoutPairedTrainingDataMLSLP2016
LisaAnneHendricks1,Subhashini Venugopalan2,MarcusRohrbach1,RaymondMooney2,KateSaenko3,TrevorDarrell1
1 UniversityofCalifornia,Berkeley2 UniversityofTexasatAustin3 BostonUniversity
![Page 2: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/2.jpg)
VisualDescription
BerkeleyLRCN:Abrownbearstandingontopofalushgreenfield.
MSCaptionBot:Alargebrownbearwalkingthroughaforest.
LRCN:Donahue, Jeffetal.CVPR2015.MicrosoftCaptionBot:http://captionbot.ai/
![Page 3: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/3.jpg)
Abrownbearwalkingacrossalushgreenfield.
Alargebrownbearwalkingthroughaforest.
Abrownbearsittingontopofagreenfield.
A brownbearwalksinthegrassinfrontoftrees.
A brownbearwalkingonagrassyfieldnexttotrees.
A largebrownbearwalkingacrossalushgreenfield.
![Page 4: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/4.jpg)
ProblemswithVisualDescription
LRCN:Donahue, Jeffetal.CVPR2015.CaptionBot:http://captionbot.ai/
BerkeleyLRCN:“Ablackbear isstandinginthegrass.”
MSCaptionBot:“Abear thatiseatingsomegrass.”
Ours:“Aanteater isstandinginthegrass.”
![Page 5: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/5.jpg)
WepresenttheDeepCompositionalCaptioner (DCC)whichcancomposedescriptionsaboutnovelobjectsincontext.
ExistingMethods
PairedImage-SentenceDataAgreenandwhitebusdrivingdownthestreet.Abrowntablewithlotsofbottlesonit.
DeepCompositionalCaptioner
UnpairedImageData
bottle
otter
toad
bus
UnpairedTextData
Abusisaroadvehicledesigned tocarrymanypassengers.
Ottersliveinavarietyofaquaticenvironments.
![Page 6: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/6.jpg)
DCCKeyInsights2.Transferknowledgebetweenrelated
concepts
giraffe impala
dress tutu
cake scone
Learnimagefeatureswithunpairedimagedata
Learnlanguagefeatureswithunpairedtextdata
PreviousWord
𝑓" 𝑓#
PredictedWord
MultimodalUnit
1.Effectivelytrainwithoutsidedata
![Page 7: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/7.jpg)
Impala:0.86Sunny:0.72…Bus:0.04
TrainingData:UnpairedImageData
Network:VGG+multilabel loss(sigmoidcrossentropy)
Feature:Vectorwithactivationscorrespondingtoscoresforvisualconceptsinanimage.
CNN
ClassificationLayer
𝑓"
LexicalClassifier
![Page 8: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/8.jpg)
TrainingData:UnpairedTextData
Network:Embedlayer+LSTMunit.Modeltrainedtopredictaword,𝑤%,giventhepreviouswordsinasentence,𝑤&:%().
Feature:Vectorwhichencodespreviouswordsinthesentence.
LanguageModelPreviousWord
Embed
LSTM
WL
PredictedWord
𝑓#
![Page 9: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/9.jpg)
LanguageModelPreviousWord
Embed
LSTM
𝑊#
PredictedWord
𝑓#
CaptionModelPreviousWord
𝑓" 𝑓#
PredictedWord
𝑊"𝑊#M
ultim
odal
Unit
CNN
ClassificationLayer
𝑓"
LexicalClassifier
Trainedwithunpairedimagedata
Trainedwithpairedimage-sentencedata
Trainedwithunpairedtextdata
![Page 10: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/10.jpg)
𝑓# 𝑓"
PredictedWord
𝑊#
𝑊"
Multim
odal
Unit
A brown
S 𝑤% 𝐼, 𝑤&:% = 𝑓#𝑊# + 𝑓"𝑊" + 𝑏
𝑓#𝑊# largefor:GiraffeHorseCouch…Standing
𝑓"𝑊" largefor:GiraffeTreesStanding…Couch
LanguageFeature ImageFeature
MultimodalUnit
![Page 11: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/11.jpg)
𝑓# 𝑓"
PredictedWord
𝑊#
𝑊"
Multim
odal
Unit
A brown
S 𝑤% 𝐼, 𝑤&:% = 𝑓#𝑊# + 𝑓"𝑊" + 𝑏
𝑓#𝑊# largefor:GiraffeHorseCouch…Standing
𝑓"𝑊" largefor:GiraffeTreesStanding…Couch
MultimodalUnit
![Page 12: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/12.jpg)
Transferpairchosenusingword2vec
WeightTransfer
![Page 13: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/13.jpg)
MultimodalUnit𝑓# 𝑓"
Transferpairchosenusingword2vec
𝑊# : , 𝑣2
𝑊" : , 𝑣2
S 𝑤% = impala 𝐼,𝑤&:%()) =𝑓#𝑊# : , 𝑣2 + 𝑓"𝑊" : , 𝑣2 + 𝑏2
S 𝑤% = impala 𝐼,𝑤&:%())
WeightTransfer
𝑊" : ,𝑣:
𝑊# : , 𝑣:
S 𝑤% = giraffe 𝐼, 𝑤&:%()) =𝑓#𝑊# : , 𝑣: + 𝑓"𝑊" : , 𝑣: + 𝑏:
S 𝑤% = giraffe 𝐼,𝑤&:%())
0
0
giraffe impala
![Page 14: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/14.jpg)
MSCOCOPairedImage-SentenceData
MSCOCOUnpairedImageData
MSCOCOUnpairedTextData
”Anelephantgallopinginthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Someoneisabouttoeatsomepizza”
Elephant,Galloping,Green,Grass
People,Playing,Ball,Field
Black,Train,Tracks
Eat,Pizza
”Anelephantgalloping inthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Someoneisabouttoeatsomepizza”
”Amicrowaveissittingontopofakitchencounter”
”Akitchencounterwithamicrowaveonit”Kitchen,Microwave
Evaluation
![Page 15: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/15.jpg)
MSCOCOPairedImage-SentenceData
MSCOCOUnpairedImageData
MSCOCOUnpairedTextData
”Anelephantgallopinginthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Someoneisabouttoeatsomepizza”
Elephant,Galloping,Green,Grass
People,Playing,Ball,Field
Black,Train,Tracks
Pizza
”Anelephantgalloping inthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Someoneisabouttoeatsomepizza”
”Amicrowaveissittingontopofakitchencounter”
”Akitchencounterwithamicrowaveonit”Microwave
Held-outdataset
Evaluation
![Page 16: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/16.jpg)
DCC(Ours)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 17: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/17.jpg)
LRCN DCC(Ours)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 18: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/18.jpg)
LRCN DCC(No Transfer)
DCC(Ours)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 19: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/19.jpg)
LRCN DCC(No Transfer)
DCC(Ours)
Efficacy(F1)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 20: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/20.jpg)
LRCN DCC(No Transfer)
DCC(Ours)
Efficacy(F1)SentenceQuality(METEOR)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 21: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/21.jpg)
LRCN DCC(No Transfer)
DCC(Ours)
Efficacy(F1) 0.00 0.00 39.78SentenceQuality(METEOR)
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 22: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/22.jpg)
LRCN DCC(No Transfer)
DCC(Ours)
Efficacy(F1) 0.00 0.00 39.78SentenceQuality(METEOR)
19.33 19.90 21.00
ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired
imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences
Results:MSCOCOIn-Domain
![Page 23: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/23.jpg)
EmpiricalEvaluation
MSCOCOPairedImage-SentenceData
MSCOCOUnpairedImageData
MSCOCOUnpairedTextData
”Anelephantgallopinginthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Someoneisabouttoeatsomepizza”
Elephant,Galloping,Green,Grass
People,Playing,Ball,Field
Black,Train,Tracks
”Anelephantgalloping inthegreengrass”
”Twopeopleplayingballinafield”
”Ablacktrainstoppedonthetracks”
”Akitchencounterwithamicrowaveonit”
Out-of-DomainHeldOutDataset
Pizza”Pepperoniisapopular
pizzatopping.”
”Allmicrowavesuseatimerforthecooking
time”
Microwave
![Page 24: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/24.jpg)
UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78
DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.
Results:MSCOCOOut-Of-Domain
![Page 25: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/25.jpg)
UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78DCC(Ours) ImageNet MSCOCO 20.71 33.60
DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.
Results:MSCOCOOut-Of-Domain
![Page 26: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/26.jpg)
UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78DCC(Ours) ImageNet MSCOCO 20.71 33.60DCC(Ours) ImageNet CaptionTxt 20.66 35.53DCC(Ours) ImageNet WebCorpus 20.66 34.94
DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.
Results:MSCOCOOut-of-Domain
![Page 27: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/27.jpg)
Notransfer:Agreenandwhitestreetsignonacitystreet.DCC:Agreenandwhitebus parkedonthesideofthestreet.
Notransfer:Adoglyingonabedwithalargebrowndog.DCC:Adoglyingonacouchwithalargewindowinthebackground.
Notransfer:Twogiraffesareeatinggrassinthefield.DCC:Twozebra grazinginagreengrassfield.
Notransfer:Awhiteandblackcatissittingonatoilet.DCC:Awhitemicrowave sittingonabrickwall.
![Page 28: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/28.jpg)
DCCcandescribeover300ImageNet visualconceptsindiversecontexts.
DCC:Apersonisholdingagecko intheirhand.
BerkeleyLRCN:Apersonholdingapieceoffoodintheirhand.
MSCaptionBot:Acloseupofapersonholdingababy.
DCC:Agecko isstandingonabranchofatree.
BerkeleyLRCN:Abirdisstandingontheedgeofarock.
MSCaptionBot:Abirdthatisstandinginthewater.
![Page 29: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/29.jpg)
Awomaninachiffon tutu.
DCCcandescribeover300ImageNet visualconceptsindiversecontexts.
Awhitecentrifuge issittingonthetable.
Abunchofalychee areina
market.
Agroupofpeoplestandingaroundabaobab inafield.
Abrownbobcat inagreenfield.
Acloseupofawoodentablewithabottleofwhisky.
Acloseupofascone onaplate.
Ablackandwhitephotoofacandelabra
inaroom.
![Page 30: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/30.jpg)
Awomanisridingaunicycle onaunicycle.
Agroupofpeoplestandingaroundafoxhuntingona
field.
FailureCases
![Page 31: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/31.jpg)
METEOR F1Baseline(NoTransfer) 28.80 0.0+DCC(ours) 28.9 6.0+ILSVRCVideos
(NoTransfer)29.0 0.0
+DCC(ours)+ILSVRCVideos
29.10 22.2
Results:VideoDescription
![Page 32: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/32.jpg)
“CaptioningImageswithDiverseObjects”Venugopalan 2016http://arxiv.org/abs/1606.07770
NovelObjectCaptioner
![Page 33: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/33.jpg)
DCCIssue:NotEnd-to-EndTrainableLanguageModel
PreviousWord
Embed
LSTM
𝑊#
PredictedWord
𝑓#
CaptionModelPreviousWord
𝑓" 𝑓#
PredictedWord
𝑊"
𝑊#Multim
odal
Unit
CNN
ClassificationLayer
𝑓"
LexicalClassifier
![Page 34: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/34.jpg)
Image-SpecificLoss Image-TextLoss Text-SpecificLoss
PreviousWord
Embed
PredictedWord
EmbedLSTMEmbed
NOCSolution:JointObjectiveLoss
PreviousWord
PredictedWord
Embed
LSTM
Embed
CNN
Embed
PredictedWord
JointObjectiveLoss
![Page 35: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/35.jpg)
Amanisplayingracket onaracket.
DCCIssue:TransferMechanism
![Page 36: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/36.jpg)
NOCSolution:SemanticEmbedding
PreviousWord
PredictedWord
𝑊:?@ABC
LSTM
𝑊:?@AB
PreviousWord
PredictedWord
Embed
LSTM
Embed
![Page 37: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/37.jpg)
Training
Image-SpecificLoss Text-SpecificLoss
PreviousWord
PredictedWord
Embed
LSTM
Embed
CNN
Embed
PredictedWord
Image-TextLoss
PreviousWord
Embed
PredictedWord
EmbedLSTMEmbed
JointObjectiveLoss
![Page 38: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/38.jpg)
Bottle Bus Couch Microwave Pizza Racket Suitcase Zebra AverageDCC 4.63 29.79 45.87 28.09 64.59 52.24 13.16 79.88 39.78NOC 19.02 69.34 33.25 26.46 69.16 62.45 34.65 89.78 50.51
F1ScoresforNOCandDCC
![Page 39: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/39.jpg)
Contributing Factor Glove LMPretrain
ImagePretrain
AuxiliaryObjective
Meteor F1
Pretraining &Glove X X X 19.80 25.38FixImageModel X X Fixed 18.91 39.70All X X X X 20.69 50.51
Ablation:AuxiliaryObjective
![Page 40: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/40.jpg)
Contributing Factor Glove LMPretrain
ImagePretrain
AuxiliaryObjective
Meteor F1
AuxiliaryObjective X X 15.78 14.41Glove X X X 19.69 47.02All X X X X 20.69 50.51
Ablation:GloveEmbedding
![Page 41: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/41.jpg)
ImageData TextData Meteor F1MSCOCO MSCOCO 20.69 50.51MSCOCO WebCorpus 19.15 41.74ImageNet WebCorpus 17.55 36.50
TrainingwithOutsideData
![Page 42: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/42.jpg)
DescribingImageNet
Aotter issittingonarockinthesun.
Alargeflounder isrestingonarock.
Atablewithaplateofsashimi andvegetables.
Alargeglacier withamountaininthe
background.
Amanisstandingonabeachholdinga
snapper.
Agroupofpeoplestandingaroundalargewhitewarship.
![Page 43: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/43.jpg)
Errors
Achainsaw issittingonachainsaw near
achainsaw.
Avolcano viewofavolcano inthesun.
![Page 44: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/44.jpg)
OurTeam:
LisaAnneHendricks
SubhashiniVenugopalan
MarcusRohrbach
RaymondMooney
KateSaenko
TrevorDarrell
ExistingMethods
CompositionalCaptioner
Aanteater isstandinginthegrass.
LRCN:Ablackbear isstandinginthegrass.CaptionBot:Abear thatiseatingsomegrass.
PairedImage-SentenceDataUnpairedImageData UnpairedTextData
![Page 45: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired](https://reader034.vdocuments.mx/reader034/viewer/2022051907/5ffa354f322ff8580f7a37bd/html5/thumbnails/45.jpg)