residual networks backup - computer science and...
TRANSCRIPT
Content
• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary
Content
• Review of Convolutional Neural Network (CNN)• “CS231nConvolutionalNeuralNetworksforVisualRecognition”,http://cs231n.github.io/convolutional-networks/, 2017.
• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary
Review of Convolutional Neural Network
• DeepCNN• Evidenceshowsthatnetworkdepthisofcrucialimportance,andtheleadingresultsonthechallengingImageNet datasetallexploit“verydeep”models.• Drivenbythesignificanceofdepth,aquestionarises:Islearningbetternetworksaseasyasstackingmorelayers?
Review of Convolutional Neural Network
• ProblemsofDeepCNN• Thenotoriousproblemofvanishing/explodinggradients• Description:thevaluesoftheback-propagatedgradientsgetssmallerasitispropagateddownwards• Solutions:normalizedinitializationandintermediatenormalizationlayers
Review of Convolutional Neural Network
• ProblemsofDeepCNN(cont’d)• DegradationproblemofDeepCNNs• Description:Withthenetworkdepthincreasing,accuracygetssaturatedandthendegradesrapidly.Suchdegradationisnotcausedbyoverfitting,andaddingmorelayerstoasuitablydeepmodelleadstohighertraining error.• Solution:?
Content
• Review of Convolutional Neural Network (CNN)• Residual Network• He,Kaiming,etal."Deepresiduallearningforimagerecognition."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2016.
• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary
ResidualNetwork
• Idea• Thedegradationindicatesthatnotallsystemsaresimilarlyeasytooptimize.• Letusconsiderashallowerarchitectureanditsdeepercounterpartthataddsmorelayersontoit.Thereexistsasolution:theaddedlayersareidentitymapping,andtheotherlayersarecopiedfromthelearnedshallowermodel.• Theexistenceofthisconstructedsolutionindicatesthatadeepermodelshouldproducenohighertrainingerrorthanitsshallowercounterpart.
ResidualNetwork
• ResidualNetwork• Insteadofhopingeachfewstackedlayersdirectlyfitadesiredunderlyingmapping,weexplicitlylettheselayersfitaresidualmapping.• Formally,denotingthedesiredunderlyingmappingasH(x),weletthestackednonlinearlayersfitanothermappingofF(x):=H(x)– x.TheoriginalmappingisrecastintoF(x)+x.
ResidualNetwork
• ResidualNetwork(cont’d)• Wehypothesizethatitiseasiertooptimizetheresidualmappingthantooptimizetheoriginal,unreferencedmapping.Totheextreme,ifanidentitymappingwereoptimal,itwouldbeeasiertopushtheresidualtozerothantofitanidentitymappingbyastackofnonlinearlayers.
ResidualNetwork
• Experiments• Dataset:CIFAR-10• DatasetDetails:
• 32x32colorimages• drawnfrom10classes• splitinto50ktrainingimagesand10ktestingimages
• ExperimentalResults• ResidualNetworkwonthe1st placeinILSVRC2015(dataset:ImageNet)andCOCO2015
ResidualNetwork
• ProblemofResidualNetwork• Diminishingfeaturereuse• Description:Asgradientflowsthroughthenetwork,thereisnothingtoforceittogothroughresidualblockweightsanditcanavoidlearninganythingduringtraining,soitispossiblethatthereiseitheronlyafewblocksthatlearnusefulrepresentations,ormanyblocksshareverylittleinformationwithsmallcontributiontothefinalgoal.• Solution:?
Content
• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Zagoruyko,Sergey,andNikosKomodakis."Wideresidualnetworks." arXivpreprintarXiv:1605.07146 (2016).
• Wide Residual Network for Robust Speech Recognition• Summary
WideResidualNetwork
• Idea• Toreducedepthandincreasetherepresentationpowerofresidualblocks,therearethreesimpleways.• Toaddmoreconvolutionallayersperblock• Towidentheconvolutionallayersbyaddingmorefeatureplanes• Toincreasefiltersizesinconvolutionallayers
WideResidualNetwork
• WideResidualNetwork• “Towidentheconvolutionallayersbyaddingmorefeatureplanes”canbeinterpretedastoincreasethenumberoffiltersforeachconvolutionlayer.[*]
[*]AddedbythePresenter
WideResidualNetwork
• Experiments• Dataset:CIFAR-10andCIFAR-100• DatasetDetails:
• 32x32colorimages• drawnfrom10classesand100classes,respectively• splitinto50ktrainingimagesand10ktestingimages
Content
• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Jahn Heymann,LukasDrude,andReinholdHaeb-Umbach."WideResidualBLSTMNetworkwithDiscriminativeSpeakerAdaptationforRobustSpeechRecognition.”
• Summary
WideResidualNetworkforRobustASR
• WideResidualNetworkforRobustASR• State-of-the-artmodelforCHiME4singlechanneltaskwithbaselinelanguagemodel• CombinewideresidualnetworkwithBidirectionalLSTM(BLSTM)• Wideresidualnetworkisusedtorefinethefeature,andBLSTMsareusedtoincorporatecontextinformation.[*]
[*]AddedbythePresenter
WideResidualNetworkforRobustASR
• Experiment• Dataset:CHiME4• DatasetDetails:
• Trainingset:1600(real)+7138(simulated)=8738noisyutterances• Developmentset:410(real)X4(environments)+410(simulated)X4(environments)=3280utterances
• Testset:330(real)X4(environments)+330(simulated)X4(environments)=2640utterances
• Environments:bus (BUS),cafe (CAF),pedestrianarea (PED),andstreetjunction (STR)
Content
• Review of Convolutional Neural Network (CNN)• Residual Network• Wide Residual Network• Wide Residual Network for Robust Speech Recognition• Summary
Summary
• WereviewedthehistoryofanimportantbranchofCNN,ResidualNetworks.• Welearnednotonlythenetworkframeworksbutalsothelogicandideasbehindeachprogress.