convolutional neural networks iiiyjlee/teaching/ecs269-fall2019/cnn_basi… · convolutional neural...

ConvolutionalneuralnetworksIII

October2nd,2019

YongJaeLeeUCDavis

ManyslidesfromRobFergus,SvetlanaLazebnik,Jia-BinHuang,DerekHoiem,AdrianaKovashka,AndrejKarpathy

Announcements•  Sign-upforpaperpresentations•  FirstpaperreviewdueThurs11:59PM

2

Gradientdescent•  We’llupdateweightsiteratively•  Moveindirectionoppositetogradient:

LLearning rate

Time

Figure from Andrej Karpathy

original W negative gradient direction

W_1

W_2

loss function landscape

Gradientdescentinmulti-layernets•  We’llupdateweights•  Moveindirectionoppositetogradient:

•  Howtoupdatetheweightsatalllayers?•  Answer:backpropagationoflossfromhigher

layerstolowerlayers

Backpropagation:Graphicexample

•  Firstcalculateerrorofoutputunitsandusethistochangethetoplayerofweights.

output

hidden

input

Update weights into j

Adapted from Ray Mooney

k j i

w(2)

w(1)


•  Nextcalculateerrorforhiddenunitsbasedonerrorsontheoutputunitsitfeedsinto.

output

hidden

input

k j i



•  Finallyupdatebottomlayerofweightsbasedonerrorscalculatedforhiddenunits.

output

hidden

input

Update weights into i

k j i


Backpropagation•  Easierifweusecomputationalgraphs,

especiallywhenwehavecomplicatedfunctionstypicalindeepneuralnetworks

Figure from Karpathy

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 2016

e.g.x=-2,y=5,z=-4

Lecture 4 - 10

13Jan2016Fei-FeiLi&AndrejKarpathy&JustinJohnson

Andrej Karpathy

Backpropagation: a simple example


e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 11


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 12


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 13


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 14


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 15


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 16


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 17


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 18


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Chain rule: Want:


Andrej Karpathy


Upstream gradient Local gradient


e.g.x=-2,y=5,z=-4

Want:

Lecture 4 - 20


Andrej Karpathy



e.g.x=-2,y=5,z=-4

Chain rule: Want:

Lecture 4 - 21


Andrej Karpathy



f

activations

Lecture 4 - 22


Andrej Karpathy


activations

Lecture 4 - 23


Andrej Karpathy

“local gradient”

f

gradients


activations

Lecture 4 - 24


Andrej Karpathy


f

gradients


activations

Lecture 4 - 25


Andrej Karpathy


f

gradients


activations


Andrej Karpathy


f

gradients


activations

Lecture 4 - 27


Andrej Karpathy


f

gradients

Backpropagation: another example

Andrej Karpathy

ConvolutionalNeuralNetworks(CNN)•  Neuralnetworkwithspecialized

connectivitystructure•  Stackmultiplestagesoffeatureextractors•  Higherstagescomputemoreglobal,more

invariant,moreabstractfeatures•  Classificationlayerattheend

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.

Adapted from Rob Fergus

•  Feed-forwardfeatureextraction:1.  Convolveinputwithlearnedfilters2.  Applynon-linearity3.  Spatialpooling(downsample)

•  Supervisedtrainingofconvolutionalfiltersbyback-propagatingclassificationerror

Adapted from Lana Lazebnik

ConvolutionalNeuralNetworks(CNN)

Input Image

Convolution (Learned)

Non-linearity

Spatial pooling

Output (class probs)

…

32

3

32x32x3 image

width

height

32 depth

Convolutions:Moredetail

Andrej Karpathy

32

32

3

5x5x3 filter

32x32x3 image

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”

Convolutions:Moredetail

AndrejKarpathy

32

32

3

ConvolutionLayer32x32x3 image 5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

Convolutions: More detail

AndrejKarpathy

32

32

3

ConvolutionLayeractivation map

32x32x3 image 5x5x3 filter

1

28

28

convolve (slide) over all spatial locations


AndrejKarpathy

32

32

3

Convolution Layer

32x32x3 image 5x5x3 filter

activation maps

1

28

28

convolve (slide) over all spatial locations

considerasecond,greenfilter


AndrejKarpathy

32

3 6

28

activation maps 32

28

Convolution Layer

Forexample,ifwehad65x5filters,we’llget6separateactivationmaps:

We stack these up to get a “new image” of size 28x28x6!


AndrejKarpathy

Preview:ConvNetisasequenceofConvolutionLayers,interspersedwithactivationfunctions

32

32

3

28

28

6

CONV, ReLU e.g. 6 5x5x3 filters


AndrejKarpathy

Preview:ConvNetisasequenceofConvolutionalLayers,interspersedwithactivationfunctions

32

32

3

CONV, ReLU e.g. 6 5x5x3 filters 28

28

6

CONV, ReLU e.g. 10 5x5x6 filters

CONV, ReLU

….

10

24

24


AndrejKarpathy

preview:


AndrejKarpathy

Figurefromhttp://www.mdpi.com/2072-4292/7/11/14680/htm

ACommonArchitecture:AlexNet

CaseStudy:VGGNet

Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride 2

best model 11.2% top 5 error in ILSVRC 2013 -> 7.3% top 5 error

[Simonyan and Zisserman, 2014]

AndrejKarpathy

[Szegedy et al., 2014]

Inception module

ILSVRC 2014 winner (6.7% top 5 error)

Case Study: GoogLeNet

AndrejKarpathy

Slide from Kaiming He’s presentation https://www.youtube.com/watch?v=1PGLj-uKT1w

[He et al., 2015]


CaseStudy:ResNet

AndrejKarpathy

(slide from Kaiming He’s presentation)

CaseStudy:ResNet

AndrejKarpathy

[He et al., 2015]


(slide from Kaiming He’s presentation)

2-3 weeks of training on 8 GPU machine at runtime: faster than a VGGNet! (even though it has 8x more layers)

CaseStudy:ResNet

AndrejKarpathy

Practicalmatters

Commentsontrainingalgorithm•  Notguaranteedtoconvergetozerotrainingerror,may

convergetolocaloptimaoroscillateindefinitely.•  However,inpractice,doesconvergetolowerrorformany

largenetworksonrealdata.•  Thousandsofepochs(epoch=networkseesalltrainingdata

once)mayberequired,hoursordaystotrain.•  Toavoidlocal-minimaproblems,runseveraltrialsstarting

withdifferentrandomweights(randomrestarts),andtakeresultsoftrialwithlowesttrainingseterror.

•  Maybehardtosetlearningrateandtoselectnumberofhiddenunitsandlayers.

•  Neuralnetworkshadfallenoutoffashionin90s,early2000s;backwithanewnameandsignificantlyimprovedperformance(deepnetworkstrainedwithdropoutandlotsofdata).

Ray Mooney, Carlos Guestrin, Dhruv Batra

Over-trainingprevention•  Runningtoomanyepochscanresultinover-fitting.

•  Keepahold-outvalidationsetandtestaccuracyonitaftereveryepoch.Stoptrainingwhenadditionalepochsactuallyincreasevalidationerror.

0 # training epochs

erro

r

on training data

on test data


Training:Bestpractices•  Usemini-batch•  Useregularization•  Usecross-validationforyourparameters•  UseRELUorleakyRELU,don’tusesigmoid•  Center(subtractmeanfrom)yourdata•  Learningrate:toohigh?toolow?•  UseBatchNorm

DataAugmentation(Jittering)•  Createvirtualtrainingsamples

– Horizontalflip– Randomcrop– Colorcasting– Geometricdistortion

Jia-bin Huang, Image: https://github.com/aleju/imgaug

Regularization:Dropout

Dropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014]

•  Randomly turn off some neurons •  Allows individual neurons to independently be responsible for performance

Adapted from Jia-bin Huang

TransferLearning

“You need a lot of a data if you want to train/use CNNs”

Andrej Karpathy

TransferLearningwithCNNs

•  Themoreweightsyouneedtolearn,themoredatayouneed

•  That’swhywithadeepernetwork,youneedmoredatafortrainingthanforashallowernetwork

•  Onepossiblesolution:

Set these to the already learned weights from another network

Learn these on your own task

1. Train on ImageNet

2. Small dataset:

Freeze these

Train this

3. Medium dataset: finetuning

more data = retrain more of the network (or all of it)

Freeze these

Lecture 11 - 29

Train this

TransferLearningwithCNNs

Adapted from Andrej Karpathy

Source: classification on ImageNet Target: some other task/data

moregeneric

more specific

Lecture 11 - 34

very similar dataset

very different dataset

very little data Use linear classifier on top layer

You’re in trouble… Try linear classifier from different stages

quite a lot of data

Finetune a few layers

Finetune a larger number of layers

Transfer Learning with CNNs

Andrej Karpathy

Summary•  Weusedeepneuralnetworksbecauseoftheir

strongperformanceinpractice•  Convolutionalneuralnetworks(CNN)

•  Convolution,nonlinearity,maxpooling•  Trainingdeepneuralnets

•  Weneedanobjectivefunctionthatmeasuresandguidesustowardsgoodperformance

•  Weneedawaytominimizethelossfunction:stochasticgradientdescent

•  Weneedbackpropagationtopropagateerrorthroughalllayersandchangetheirweights

•  Practicesforpreventingoverfitting•  Dropout;BatchNorm;dataaugmentation;transfer

learning

Questions?

SeeyouFriday!

56

convolutional neural networks iiiyjlee/teaching/ecs269-fall2019/cnn_basi… · convolutional neural...

Documents