residual networks backup - computer science and...

28
An Overview of Residual Networks Presented by Peidong Wang 03/30/2017

Upload: lamnguyet

Post on 09-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

An Overview of ResidualNetworks

Presented by PeidongWang03/30/2017

Content

• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary

Content

• Review of Convolutional Neural Network (CNN)• “CS231nConvolutionalNeuralNetworksforVisualRecognition”,http://cs231n.github.io/convolutional-networks/, 2017.

• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary

Review of Convolutional Neural Network

Review of Convolutional Neural Network

Review of Convolutional Neural Network

• DeepCNN• Evidenceshowsthatnetworkdepthisofcrucialimportance,andtheleadingresultsonthechallengingImageNet datasetallexploit“verydeep”models.• Drivenbythesignificanceofdepth,aquestionarises:Islearningbetternetworksaseasyasstackingmorelayers?

Review of Convolutional Neural Network

• ProblemsofDeepCNN• Thenotoriousproblemofvanishing/explodinggradients• Description:thevaluesoftheback-propagatedgradientsgetssmallerasitispropagateddownwards• Solutions:normalizedinitializationandintermediatenormalizationlayers

Review of Convolutional Neural Network

• ProblemsofDeepCNN(cont’d)• DegradationproblemofDeepCNNs• Description:Withthenetworkdepthincreasing,accuracygetssaturatedandthendegradesrapidly.Suchdegradationisnotcausedbyoverfitting,andaddingmorelayerstoasuitablydeepmodelleadstohighertraining error.• Solution:?

Content

• Review of Convolutional Neural Network (CNN)• Residual Network• He,Kaiming,etal."Deepresiduallearningforimagerecognition."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2016.

• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary

ResidualNetwork

• Idea• Thedegradationindicatesthatnotallsystemsaresimilarlyeasytooptimize.• Letusconsiderashallowerarchitectureanditsdeepercounterpartthataddsmorelayersontoit.Thereexistsasolution:theaddedlayersareidentitymapping,andtheotherlayersarecopiedfromthelearnedshallowermodel.• Theexistenceofthisconstructedsolutionindicatesthatadeepermodelshouldproducenohighertrainingerrorthanitsshallowercounterpart.

ResidualNetwork

• ResidualNetwork• Insteadofhopingeachfewstackedlayersdirectlyfitadesiredunderlyingmapping,weexplicitlylettheselayersfitaresidualmapping.• Formally,denotingthedesiredunderlyingmappingasH(x),weletthestackednonlinearlayersfitanothermappingofF(x):=H(x)– x.TheoriginalmappingisrecastintoF(x)+x.

ResidualNetwork

• ResidualNetwork(cont’d)• Wehypothesizethatitiseasiertooptimizetheresidualmappingthantooptimizetheoriginal,unreferencedmapping.Totheextreme,ifanidentitymappingwereoptimal,itwouldbeeasiertopushtheresidualtozerothantofitanidentitymappingbyastackofnonlinearlayers.

ResidualNetwork

• Experiments• Dataset:CIFAR-10• DatasetDetails:

• 32x32colorimages• drawnfrom10classes• splitinto50ktrainingimagesand10ktestingimages

• ExperimentalResults• ResidualNetworkwonthe1st placeinILSVRC2015(dataset:ImageNet)andCOCO2015

ResidualNetwork

• ExperimentalResults(cont’d)

ResidualNetwork

• ProblemofResidualNetwork• Diminishingfeaturereuse• Description:Asgradientflowsthroughthenetwork,thereisnothingtoforceittogothroughresidualblockweightsanditcanavoidlearninganythingduringtraining,soitispossiblethatthereiseitheronlyafewblocksthatlearnusefulrepresentations,ormanyblocksshareverylittleinformationwithsmallcontributiontothefinalgoal.• Solution:?

Content

• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Zagoruyko,Sergey,andNikosKomodakis."Wideresidualnetworks." arXivpreprintarXiv:1605.07146 (2016).

• Wide Residual Network for Robust Speech Recognition• Summary

WideResidualNetwork

• Idea• Toreducedepthandincreasetherepresentationpowerofresidualblocks,therearethreesimpleways.• Toaddmoreconvolutionallayersperblock• Towidentheconvolutionallayersbyaddingmorefeatureplanes• Toincreasefiltersizesinconvolutionallayers

WideResidualNetwork

• WideResidualNetwork• “Towidentheconvolutionallayersbyaddingmorefeatureplanes”canbeinterpretedastoincreasethenumberoffiltersforeachconvolutionlayer.[*]

[*]AddedbythePresenter

WideResidualNetwork

• Experiments• Dataset:CIFAR-10andCIFAR-100• DatasetDetails:

• 32x32colorimages• drawnfrom10classesand100classes,respectively• splitinto50ktrainingimagesand10ktestingimages

WideResidualNetwork

• ExperimentalResults:

Content

• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Jahn Heymann,LukasDrude,andReinholdHaeb-Umbach."WideResidualBLSTMNetworkwithDiscriminativeSpeakerAdaptationforRobustSpeechRecognition.”

• Summary

WideResidualNetworkforRobustASR

• WideResidualNetworkforRobustASR• State-of-the-artmodelforCHiME4singlechanneltaskwithbaselinelanguagemodel• CombinewideresidualnetworkwithBidirectionalLSTM(BLSTM)• Wideresidualnetworkisusedtorefinethefeature,andBLSTMsareusedtoincorporatecontextinformation.[*]

[*]AddedbythePresenter

WideResidualNetworkforRobustASR

• WideResidualNetworkforRobustASR(cont’d)

WideResidualNetworkforRobustASR

• Experiment• Dataset:CHiME4• DatasetDetails:

• Trainingset:1600(real)+7138(simulated)=8738noisyutterances• Developmentset:410(real)X4(environments)+410(simulated)X4(environments)=3280utterances

• Testset:330(real)X4(environments)+330(simulated)X4(environments)=2640utterances

• Environments:bus (BUS),cafe (CAF),pedestrianarea (PED),andstreetjunction (STR)

WideResidualNetworkforRobustASR

• ExperimentalResults

Content

• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary

Summary

• WereviewedthehistoryofanimportantbranchofCNN,ResidualNetworks.• Welearnednotonlythenetworkframeworksbutalsothelogicandideasbehindeachprogress.

Thank You!