convolutional neural networks and supervised learningconvolutional neural networks and supervised...

31

Upload: others

Post on 02-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Convolutional Neural Networks and SupervisedLearning

Eilif Solberg

August 30, 2018

Page 2: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Outline

Convolutional ArchitecturesConvolutional neural networks

TrainingLossOptimizationRegularizationHyperparameter search

Achitecture searchNAS1NAS2

Bibliography

Page 3: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Convolutional Architectures

Page 4: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Template matching

Figure: Illustration fromhttp://pixuate.com/technology/template-matching/

1. Try to match template at each location by �sliding overwindow�

2. Threshold for detection

For 2D-objects, kind of possible but di�cult

Page 5: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Convolution

Which �lter has produces the activation map on the right?

Page 6: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Convolutional layer

�> Glori�ed template matching

� Many templates (aka output �lters)

� We learn the templates, the weights are the templates

� Intermediate detection results only means to an end

� treat them as features, which we again match new templates to

� Starting from the second layer we have �nonlinear �lters�

Page 7: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Hyperparameters of convolutional layer

1. Kernel height and width -template sizes

2. Stride - skips between templatematches

3. Dilation rate� �Wholes� in template where

we don't care� Larger �eld-of-view without

more weights. . .

4. Number of output �lters -number of templates

5. Padding - expand image,typically with zeros

Figure: Image fromhttp://neuralnetworksanddeeplearning.com/

Page 8: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Detector / activation function

� Non-saturating activation functions as ReLU, leaky ReLUdominating

Figure: Sigmoidfunction

Figure: Tanh functionFigure: ReLU function

Page 9: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Basic CNN architecture for image classi�cation

Image �> [Conv �> ReLU]xN �> Fully Connected �> Softmax

� Increase �lter depth when using stride

Improve with:

� Batch normalization

� Skip connections ala ResNet or DenseNet

� No fully connected, average pool predictions instead

Page 10: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Training

Page 11: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

How do we �t model?

How do we �nd parameters θ for our network?

Page 12: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Supervised learning

� Training data comes as (X ,Y ) pairs, where Y is the target

� Want to learn f (x) ∼ p(y |x), conditional distribution of Ygiven X

� De�ne di�erentiable surrogate loss function, e.g. for a singlesample

l(f (X ),Y ) = (f (X )− Y )2regression (1)

l(f (X ),Y ) = −∑c

Yc log(f (X )c)classi�cation (2)

Page 13: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Gradient

� The direction for which the function increases the most

Figure: Gradient of the function f (x2, y2) = x/ex2+y

2

[By Vivekj78 [CC BY-SA3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons]

Page 14: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Backpropagation

� E�cient bookkeeping scheme when applying chain rule fordi�erentiation

� Biologically implausible?

Page 15: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

(Stochastic) gradient descentTaking steps in the opposite direction of the gradient

Figure: [By Vivekj78 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], fromWikimedia Commons]

� Full gradient too expensive / not necessary

N∑i=1

∇θl(f (Xi ),Yi ) ≈n∑i=1

∇θl(f (XP(i)),YP(i)) (3)

for a random permutation P .

Many di�erent extensions to standard SGD� SGD with momentum, RMSprop, ADAM.

Page 16: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Network, loss, optimization

� Weight penalty added to loss term, usually squared L2normalization uniformly for all parameters

l(θ) + λ‖θ‖22 (4)

� Dropout

� Batch normalization� Intersection of optimization and generalization� Your best friend and your worst enemy

Page 17: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

More on batch normalization

For a tensor [batch_size Ö height Ö width Ö depth], normalize�template matching scores� for each template d by

µd ←1

N ∗ H ∗W

N∑i=1

H∑h=1

W∑w=1

xi ,h,w ,d (5)

σ2d← 1

N ∗ H ∗W

N∑i=1

H∑h=1

W∑w=1

(xi ,h,w ,d − µd )2 (6)

x̂i ,h,w ,d ←xi ,h,w ,d − µd√

(σ2d+ ε)

(7)

yi ,h,w ,d ← γx̂i ,h,w ,d + β (8)

where N, H and W represents batch size, height and width.

� �Template/Feature more present than usual or not�

� During inference we use stored values for µd and σd .

Page 18: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Data augmentation

Idea: apply random transformation to X that does not alter Y .

� Normally you would like result X ′ to be plausible, i.e. couldhave been a sample from the distribution of interest

� Which transformation you may use is application-dependent.

Image data

� Horizontal mirroring (issuefor objects not left/rightsymmetric)

� Random crop

� Scale

� Aspect ratio

� Lightning etc.

Text data

� Synonym insertion

� Back-translation: translateand translate back with e.g.Google Translate!!!

Page 19: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Hyperparameters to search

� Learning rate (and learning rate schedule)

� Regularization params: L2, (dropout)

Page 20: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Search strategies

� random search rather than grid search

� logscale when appropriate

� careful with best values on border

� may re�ne search

Page 21: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Achitecture search

Page 22: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Architecture search

1. De�ne the search space.

2. Decide upon the optimization algorithm� random search, reinforcment learning, genetic algorithms

Page 23: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Neural architecture search

Figure: An overview of Neural Architecture Search. Figure and captionfrom [?].

Page 24: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS1 - search space

Fixed structure:

� Architecture is a series of layers of the form

conv2D(FH, FW, N) −→ batch-norm −→ ReLU

Degrees of freedom:

� Parameters of conv layer� �lter height, �lter width and number of output �lters

� Input layers to each conv layer

Page 25: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS1 - discovered architecture

Figure: FH is �lter height, FW is �lter width and N is number of �lters.If one layer has many input layers then all input layers are concatenatedin the depth dimension. Figure from [?].

Page 26: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS2 - search space

Fixed structure:

Figure: Architecure for CIFAR-10 and ImageNet. Figure from [?].

Degrees of freedom:

� Some freedom in normal cell and reduction cell, shall see soon

Page 27: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS2 - discovered convolutional cells

Normal Cell Reduction Cell

hi

hi-1

...

hi+1

concat

avg!3x3

sep!5x5

sep!7x7

sep!5x5

max!3x3

sep!7x7

add add

add add add

sep!3x3

iden!tity

avg!3x3

max!3x3

hi

hi-1

...

hi+1

concat

sep!3x3

avg!3x3

avg!3x3

sep!5x5

sep!3x3

iden!tity

iden!tity

sep!3x3

sep!5x5

avg!3x3

add add add addadd

Figure: NASNet-A identi�ed with CIFAR-10. Figure and caption from[?].

Page 28: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS2 - Performance(computational_cost)

10000 20000 300000

75

70

65

80

85

# Mult-Add operations (millions)

accu

racy

(pre

cisi

on @

1)

40000

PolyNet

Inception-v1

VGG-16

MobileNet

Inception-v3

Inception-v2

ResNeXt-101

ResNet-152Inception-v4

Inception-ResNet-v2

Xception

NASNet-A (6 @ 4032)

ShuffleNet

DPN-131NASNet-A (7 @ 1920)

NASNet-A (5 @ 1538)

NASNet-A (4 @ 1056)

SENet

Figure: Performance on ILSVRC12 as a function of number of�oating-point multiply-add operations needed to process an image.Figure from [?].

Page 29: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

NAS2 - Performance(#parameters)

75

70

65

80

85

# parameters (millions)

accu

racy

(pre

cisi

on @

1)

60 80 100 120 1400 4020

NASNet-A (5 @ 1538)

NASNet-A (4 @ 1056)VGG-16

PolyNet

MobileNetInception-v1

ResNeXt-101

Inception-v2

Inception-v4

Inception-ResNet-v2

ResNet-152

Xception

Inception-v3

ShuffleNet

DPN-131

NASNet-A (6 @ 4032)

NASNet-A (7 @ 1920) SENet

Figure: Performance on ILSVRC12 as a function of number ofparameters. Figure from [?].

Page 30: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Bibliography

Page 31: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization

Convolutional Architectures Training Achitecture search Bibliography

Bibliography I