compacting convnets for end to end...

46
NICTA Copyright 2012 From imagination to impact Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.

Upload: others

Post on 07-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting ConvNets

for end to end Learning

Jose M. Alvarez

Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.

Page 2: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Success of CNN

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet

Classification with Deep Convolutional Neural Networks, NIPS, 2012

Image Classification

Page 3: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Success of CNN

from Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN:

Towards Real-Time Object Detection with Region Proposal Networks,

arXiv:1506.01497

Object Detection

Page 4: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Success of CNN

Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to

Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640

Semantic Segmentation

Page 5: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Success of CNN

Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating

Image Description, CVPR, 2015

Image Captioning

Video classification …

Page 6: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

• Better training algorithms

– Batch normalization

– Initializations

– Momentum

Page 7: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

• Better training algorithms

• Large amount of data / labels

Page 8: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

• Better training algorithms

• Large amount of data / labels

• Hardware / Storage

– GPU, parallel systems

0

2

4

6

8

10

12

14

GTX-580 Titan Black ('14) Titan X ('15)

Memory GPU (in Gb)

Page 9: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

• Better training algorithms

• Large amount of data / labels

• Hardware / Storage

• Larger community of researchers

Page 10: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

• Enabled larger networks

0

20

40

60

80

100

120

140

160

LeNet-5 AlexNet VGGNet-16

Num. Parameters (in Millions)

Page 11: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

0

50

100

150

LeNet-5 AlexNet VGGNet-16

Num. Parameters (in Millions)

Page 12: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

0

50

100

150

LeNet-5 AlexNet VGGNet-16

Num. Parameters (in Millions)

Page 13: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Key of success

0

50

100

150

LeNet-5 AlexNet VGGNet-16

Num. Parameters (in Millions)

Page 14: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Embedded devices with limited resources / power

Challenges

2014 –Jetson TK1 2015/16 –Jetson TX1

Page 15: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Embedded devices with limited resources / power

- Memory is a limiting factor

- Real time operation

Challenges

Page 16: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Computational Cost

Forward-pass is time consuming AlexNet

Page 17: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Computational Cost

Memory bottleneck AlexNet

Page 18: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Computational Cost

Memory bottleneck

conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544

VGGNet

Page 19: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Do we need all these parameters?

Page 20: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Over-Parameterization

• ‘Needed for high non-convex optimization’ 1

Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun.

The Loss Surfaces of Multilayer Networks

Page 21: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Over-Parameterization

• ‘Needed for high non-convex optimization’

• Deeper structures, larger learning capacity1

1 Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio. On the Number of

Linear Regions of Deep Neural Networks. NIPS 2014

Page 22: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Over-Parameterization

• ‘Needed for high non-convex optimization’

• Deeper structures, larger learning capacity

• From images to Video -> Even larger nets?

A. Karpathy et. al. Large-scale Video Classification with Convolutional

Neural Networks. CVPR 2014.

Page 23: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

Page 24: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation

• Network pruning

• Structured parameters

– Ours

Page 25: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation

Page 26: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation

– Large network learns from data

– Generate labels using the trained network

– Train smaller nets using the output or soft layer

Geoffrey Hinton, Oriol Vinyals, Jeff Dean. Distilling the Knowledge in a Neural Network.

NIPSw 2015

Page 27: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation (II)

– Use intermediate layers to guide the training

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo

Gatta and Yoshua Bengio. FitNets: Hints for Thin Deep Nets. ICLR 2015

Page 28: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Pros

– In general better generalization and faster.

– Equal or slightly better performance

• Cons

– Requires a larger network to learn from.

Page 29: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation

• Network pruning

– Directly remove unimportant parameters during

training

• Requires second derivatives.

– Remove parameters + quantification1

• Good compression rates (orthogonal to other approaches)

1S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network

with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2015

Page 30: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN

• Network distillation

• Network pruning

• Structured parameters

Page 31: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

Max Jaderberg, Andrea Vedaldi, Andrew Zisserman Speeding up Convolutional Neural

Networks with Low Rank Expansions. BMVC 2014

• Low rank approximations

Page 32: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting

Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

• Low rank approximations (II)

Page 33: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Low rank approximations (III)

– Weights are approximated by a sum of rank 1

tensors.

Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting

Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

Page 34: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Weak-Points

– Needs a full-rank network completely trained

– Not all filters can be approximated

– Theoretical speeds-up with drop of performance.

Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting

Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

Page 35: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Weak-Points

– Needs a full-rank network completely trained.

– Not all filters can be approximated.

– Drop of performance.

• Strengths

– Potential ability to aid in regularization during or post

training.

– Parameter sharing within the layer.

Page 36: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image

Recognition. ICLR, 2015

• Low rank approximations (IV)

– VGG nets restrict filters during training.

– Same ‘receptive field’

– Deeper networks (more nonlinearities)

– Less parameters (49C2 vs 3x(3x3)C2 )

Page 37: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Low rank approximations (Ours1)

– Filter restriction during training.

– Larger receptive fields

– Deeper networks (more nonlinearities)

– Parameter sharing

– Less parameters

1Joint work with Lars Pertersson. Under review

Page 38: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Low rank approximations (Ours)

– ImageNet Results (AlexNet).

Baseline: Alex Krizhevsky. Ilya Sutskever. Geoffrey Hinton. ImageNet Classification with

Deep. Convolutional Neural Networks. NIPS 2012

Page 39: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting CNN: Structured parameters

• Low rank approximations (Ours)

– Stereo Matching.

Ours-1

32K

Ours-1

48K

Ours-3

32K

Baseline: Jure Zbontar, Yann LeCun. Computing the Stereo Matching Cost With a

Convolutional Neural Network. CVPR 2015

Page 40: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Memory?

Page 41: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Computational Cost

Memory bottleneck

conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544

VGGNet

Page 42: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Computational Cost

Memory bottleneck AlexNet

Page 43: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Memory Bottleneck

• Sparse constraints during training (Ours2)

– Directly reduce the number of neurons.

– Select the optimum number of neurons.

– Significant memory reductions with minor drop of

performance

2Joint work with Hao Zhou, Fatih Porikli. Under review

Page 44: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Memory Bottleneck

• Sparse constraints during training (Ours2)

2Joint work with Hao Zhou, Fatih Porikli. Under review

Page 45: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Do we need all these parameters?

Page 46: Compacting ConvNets for end to end Learningjuxi.net/workshop/deeplearning-applications-vision...Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson,

NICTA Copyright 2012 From imagination to impact

Compacting ConvNets

for end to end Learning

Jose M. Alvarez

Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.