week 8 ¢â‚¬â€œ applications of convolutional neural a brief...

Download Week 8 ¢â‚¬â€œ Applications of Convolutional Neural A brief note on normalization Applications of ConvNets

Post on 27-Jun-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Ceng 783 – Deep Learning

    Week 8 – Applications of Convolutional Neural Networks

    Fall 2017

    Emre Akbas

  • 3

    Today

    A brief note on normalization Applications of ConvNets

    – Image classification (ResNets, ResNeXt, DenseNet)

    – Object detection – Artistic style transfer – Image segmentation (FCNs, “deconvolution”) – Visualizing ConvNet classifications – ConvNets for NLP

  • 4

    Normalization Remember batch norm? What happens as m gets smaller? Also, what do you do at test time?

  • 5 [From “Group normalization”, Wu et al. ECCV 2018]

  • 6

    [From “Group normalization”, Wu et al. ECCV 2018]

  • 7

    Applications of ConvNets

  • 8

    Image classification

  • 9

    Image Classification ILSVRC benchmark/challenge – ImageNET dataset: 1.2 million images, 1000

    categories – Since 2010 – The task: given an image, make 5 predictions for the

    dominant object in the image. If one of them is correct, then it is counted as a success.

  • 10

    Image Classification ILSVRC benchmark/challenge – ImageNET dataset: 1.2 million images, 1000

    categories – Since 2010 – The task: given an image, make 5 predictions for the

    dominant object in the image. If one of them is correct, then it is counted as a success.

    Newer datasets since then – e.g. Google's Open Image Dataset

    ● 9 million images, 6000 categories [link] ● annotated with image-level labels, object bounding boxes,

    object segmentation masks, and visual relationships.

  • 11

    The second success story

    3.10.2016 CEng 783 - Deep Learning - Fall 2016 11

    Source: G. Hinton’s talk at the Royal Society, May 22, 2015. https://youtu.be/izrG86jycck

    Sli de

    fro m

    1s t w

    ee k

    https://storage.googleapis.com/openimages/web/index.html

  • 12

    Top-5 error rate over time ● 2012: AlexNet 16.5% ● 2013: ZF 11.7% ● 2014: VGG 7.3%

    2014: GoogLeNet 6.7% ● 2015: ResNet 3.6% ● Aug 2016: 3.1%

    GoogLeNet-v4

    Human error rate: 5.1% [http://karpathy.github.io/2014/09/02/what-i-learned- from-competing-against-a-convnet-on-imagenet/]

  • 13

    AlexNet [Krizhevsky et al. NIPS 2012]

    5 convolutional layers 3 full-connected layers

    Each convolutional layer consists of: convolution + ReLU + normalization + max-pooling

  • 14

    Top-5 error rate over time ● 2012: AlexNet 16.5% [Krizhevsky et al. (2012)] ● 2013: ZF 11.7% [Zeiler & Fergus (2014)] ● 2014: VGG 7.3%

    2014: GoogLeNet 6.7% ● 2015: ResNet 3.6% ● Today (Aug 2016) 3.1%

    GoogLeNet-v4

    “It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller.”

    [Source: http://cs231n.github.io/convolutional- networks/#case]

  • 15

    Top-5 error rate over time ● 2012: AlexNet 16.5% [Krizhevsky et al. (2012)] ● 2013: ZF 11.7% [Zeiler & Fergus (2014)] ● 2014: VGG 7.3% [Simonyan & Zisserman (2014)]

    2014: GoogLeNet 6.7% ● 2015: ResNet 3.6% ● Today (Aug 2016) 3.1%

    GoogLeNet-v4

    “Main contribution: depth is critical. They used 16 layers.

    Extremely homogeneous architecture: only 3x3 convolutions and 2x2 pooling.

    But, very expensive to evaluate and requires more memory. [Source: http://cs231n.github.io/convolutional-networks/#case]

  • 16

    VGG or VGGnet

    [Figure by D. Frossard]

    VGG (Visual Geometry Group at Oxford)

    [Simonyan & Zisserman , 2014]

    VGG16 VGG19

    Is the network above a VGG16 or a VGG19?

  • 17

    Top-5 error rate over time ● 2012: AlexNet 16.5% [Krizhevsky et al. (2012)] ● 2013: ZF 11.7% [Zeiler & Fergus (2014)] ● 2014: VGG 7.3% [Simonyan & Zisserman (2014)]

    2014: GoogLeNet 6.7% [Szegedy et al. (2015)] ● 2015: ResNet 3.6% ● Today (Aug 2016) 3.1%

    GoogLeNet-v4

    “Main contribution: Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M. Increased # layers to 22).

    Uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. [Source: http://cs231n.github.io/convolutional-networks/#case]

    https://www.cs.toronto.edu/~frossard/post/vgg16/

  • 18

    Inception module

    [Figure from Szegedy et al. (2015)]Multiscale processing + wider network

    But very costly!! Solution is to reduce dimension (next slide)

  • 19

    Inception module

    Dimension is reduced using 1x1 convolutions.

  • 20

    Top-5 error rate over time ● 2012: AlexNet 16.5% [Krizhevsky et al. (2012)] ● 2013: ZF 11.7% [Zeiler & Fergus (2014)] ● 2014: VGG 7.3% [Simonyan & Zisserman (2014)]

    2014: GoogLeNet 6.7% [Szegedy et al. (2015)] ● 2015: ResNet 3.6% ● 2016: 3.1% [Szegedy et al. (2016)]

    GoogLeNet-v4

    v4 of Google's Inception Network is the best right now (as of Fall 2016). Uses better crafted inception modules + residual connections.

  • 21

    Top-5 error rate over time ● 2012: AlexNet 16.5% [Krizhevsky et al. (2012)] ● 2013: ZF 11.7% [Zeiler & Fergus (2014)] ● 2014: VGG 7.3% [Simonyan & Zisserman (2014)]

    2014: GoogLeNet 6.7% [Szegedy et al. (2015)] ● 2015: ResNet 3.6% [He et al. (2015)] ● 2016: 3.1% [Szegedy et al. (2016)]

    GoogLeNet-v4

    Last ILSVRC was held in 2017. top-5 error rate was 2.3%

  • 22

    Slide from Kaiming He's talk at ICCV 2015 ImageNet and COCO joint workshop

  • 23

    Is learning better networks as simple as stacking more layers?

    Slide from Kaiming He's talk at ICCV 2015 ImageNet and COCO joint workshop

    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

  • 24Slide from Kaiming He's talk at ICCV 2015 ImageNet and COCO joint workshop

    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

  • 25

    Plain network Residual network

    F (x)=W 2σ(W 1 x)

    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

  • 26

    Plain network Residual network

    F (x)=W 2σ(W 1 x)

    Skip connection

  • 27Slide from Kaiming He's talk at ICCV 2015 ImageNet and COCO joint workshop

  • 28Slide from Kaiming He's talk at ICCV 2015 ImageNet and COCO joint workshop

    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

  • 29

    Think about how the skip connection helps with the “vanishing gradients” problem.

    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

  • 31

    ResNets have had huge impact.

    “Deep residual learning for image recognition”

    K He, X Zhang, S Ren, J Sun

    CVPR 2016

    More than 32K citations (source: Google Scholar).

    ResNets have inspired other architectures.

  • 32

    ResNeXt [Xie et al. CVPR 2017] “Aggregated Residual Transformations for Deep Neural Networks”

  • 33

    ResNeXt [Xie et al. CVPR 2017] “Aggregated Residual Transformations for Deep Neural Networks”

  • 34

    DenseNet [Huang et al. CVPR 2017] “Densely Connected Convolutional Networks”

    ● Traditional CNNs with L layers have L connections.

    ● DenseNets have L(L+1) direct connections.

  • 35

    DenseNet [Huang et al. CVPR 2017] “Densely Connected Convolutional Networks”

    ● Traditional CNNs with L layers have L connections.

    ● DenseNets have L(L+1) direct connections.

    ● For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature- maps are used as inputs into all subsequent layers.

    ● DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

  • 36

    DenseNet [Huang et al. CVPR 2017]

    On CIFAR10

    On ImageNet

  • 37

    Object detection

  • 38

    Object detection

    Task: given an image and an object class, find its instance(s):

    aeroplane

    Desired result:

  • 39

    Colab Notebook on basic object detection:

    http://bit.do/basicod

  • 40

    The ConvNet approach to object detection

    Image

    Object proposals or candidates ● Generic (class independent) ● Typically around 1000s (much larger

    than the # of object instances in the image, but much smaller than the # of total sliding windows in the image)

    Each proposal

    ConvNet

    Estimated class of the pro

Recommended

View more >