deep convnets for video processing (master in computer vision barcelona, 2016)

101
@DocXavi Module 3 - Lecture 10 Deep Convnets for Video Processing 28 January 2016 Xavier Giró-i-Nieto [http://pagines.uab.cat/mcv/ ]

Upload: xavier-giro

Post on 12-Jan-2017

1.788 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

DocXavi

Module 3 - Lecture 10

Deep Convnets for Video Processing28 January 2016

Xavier Giroacute-i-Nieto

[httppaginesuabcatmcv]

Acknowledgments

2

Linked slides

Motivation

Motivation

[Website]

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 2: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Acknowledgments

2

Linked slides

Motivation

Motivation

[Website]

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 3: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Linked slides

Motivation

Motivation

[Website]

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 4: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Motivation

Motivation

[Website]

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 5: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Motivation

[Website]

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 6: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Outline

1 Recognition2 Optical Flow3 Object Tracking4 Learn more

6

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 7: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Recognition

Demo Clarifai

MIT Technology Review ldquoA start-uprsquos Neural Network Can Understand Videordquo (322015)7

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 8: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Figure Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

8

Recognition

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 9: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

9

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 10: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

10

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Previous lectures with Jose M Aacutelvarez

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 11: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

11

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 12: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE

Slides extracted from ReadCV seminar by Victor Campos 12

Recognition DeepVideo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 13: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 13

Recognition DeepVideo Demo

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 14: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 14

Recognition DeepVideo Architectures

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 15: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 15

Unsupervised learning [Le at alrsquo11] Supervised learning [Karpathy et alrsquo14]

Recognition DeepVideo Features

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 16: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 16

Recognition DeepVideo Multiscale

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 17: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Karpathy A Toderici G Shetty S Leung T Sukthankar R amp Fei-Fei L (2014 June) Large-scale video classification with convolutional neural networks In Computer Vision and Pattern Recognition (CVPR) 2014 IEEE Conference on (pp 1725-1732) IEEE 17

Recognition DeepVideo Results

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 18: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

18

Recognition

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 19: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

19

Recognition C3D

Figure Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 20: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

20Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Demo

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 21: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

21K Simonyan A Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015

Recognition C3D Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3 following Symonian amp Zisserman (ICLR 2015)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 22: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

22Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 23: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

23Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

A homogeneous architecture with small 3 times 3 times 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition C3D Temporal dimension

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 24: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

24Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Temporal dimension

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 25: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

25Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

No gain when varying the temporal depth across layers

Recognition C3D Architecture

Featurevector

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 26: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

26Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 27: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

27Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 28: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

28Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 29: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

29Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D Compactness

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 30: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

30Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition C3D Performance

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 31: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

31Tran Du Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri Learning spatiotemporal features with 3D convolutional networks In Proceedings of the IEEE International Conference on Computer Vision pp 4489-4497 2015

Recognition C3D SoftwareImplementation by Michael Gygli (GitHub)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 32: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

32

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 33: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

33

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 34: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

34

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 35: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

35

Recognition ImageNet Video

[ILSVRC 2015 Slides and videos]

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 36: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

36

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 37: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

37

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 38: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

38

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 39: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

39

Recognition ImageNet Video

Kai Kang et al Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 40: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 40

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 41: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow Small vs Large

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 41

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 42: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 42

Optical FlowClassic approachRigid matching of HoG or SIFT descriptors

Deep MatchingAllow each subpatch to move

independently in a limited range

depending on its size

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 43: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 43

Optical Flow Deep Matching

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 44: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Source Matlab R2015b documentation for normxcorr2 by Mathworks44

Optical Flow 2D correlation

Image

Sub-Image

Offset of the sub-image with respect to the image [00]

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 45: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 45

Instead of pre-trained filters a convolution is defined between each

patch of the reference image target image

as a results a correlation map is generated for each reference patch

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 46: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 46

Optical Flow Deep Matching

The most discriminative response map

The less discriminative

response map

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 47: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 47

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 48: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 48

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Bottom-upextraction

(BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 49: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 49

Optical Flow Deep Matching (BU)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 50: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 50

Key idea Build (bottom-up) a pyramid of correlation maps to run an efficient (top-down) search

Optical Flow Deep Matching (TD)

4x4 patches

8x8 patches

16x16 patches

32x32 patches

Top-down matching

(TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 51: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 51

Optical Flow Deep Matching (TD)Each local maxima in the top layer corresponds to a shift of one of the biggest (32x32) patchesIf we focus on local maximum we can retrieve the corresponding responses one scale below and focus on shift of the sub-patches that generated it

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 52: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 52

Optical Flow Deep Matching (TD)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 53: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 53

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 54: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 54

Ground truth

Dense HOG[Brox amp Malik 2011]

Deep Matching

Optical Flow Deep Matching

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 55: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Weinzaepfel P Revaud J Harchaoui Z amp Schmid C (2013 December) DeepFlow Large displacement optical flow with deep matching In Computer Vision (ICCV) 2013 IEEE International Conference on (pp 1385-1392) IEEE 55

Optical Flow Deep Matching

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 56: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 56

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 57: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 57

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 58: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 58

End to end supervised learning of optical flow

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 59: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 59

Option A Stack both input images together and feed them through a generic network

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 60: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 60

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 61: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet (contracting)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 61

Option B Create two separate yet identical processing streams for the two images and combine them at a later stage

Correlation layer Convolution of data patches from the layers to combine

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 62: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet (expanding)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 62

Upconvolutional layers Unpooling features maps + convolutionUpconvolutioned feature maps are concatenated with the corresponding map from the contractive part

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 63: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Optical Flow FlowNet

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 63

Since existing ground truth datasets are not sufficiently large to train a Convnet a synthetic Flying Dataset is generatedhellip and augmented (translation rotation scaling transformations additive Gaussian noise changes in brightness contrast gamma and color)

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI

Data augmentation

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 64: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Dosovitskiy A Fischer P Ilg E Hausser P Hazirbas C Golkov V van der Smagt P Cremers D and Brox T 2015 FlowNet Learning Optical Flow With Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision (pp 2758-2766) 64

Optical Flow FlowNet

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 65: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking MDNet

65Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 66: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking MDNet

66Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 67: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking MDNet Architecture

67Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence but are replaced by a single one at test time

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 68: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking MDNet Online update

68Nam Hyeonseob and Bohyung Han Learning multi-domain convolutional neural networks for visual tracking ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining that is selecting negative samples with the highest positive score

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 69: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT

69Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 70: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT

70Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification

conv4-3 conv5-3

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 71: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT Specialization

71Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 72: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT Localization

72Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

Although trained for image classification feature maps in conv5-3 enable object localizationhellipbut is not discriminative enough to different objects of the same category

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 73: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking Localization

73Zhou Bolei Aditya Khosla Agata Lapedriza Aude Oliva and Antonio Torralba Object detectors emerge in deep scene cnns ICLR 2015

[Zhou et al ICLR 2015] ldquoObject detectors emerge in deep scene CNNsrdquo [Slides from ReadCV]

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 74: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT Localization

74Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

On the other hand feature maps from conv4-3 are more sensitive to intra-class appearance variationhellip

conv4-3 conv5-3

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 75: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT Architecture

75Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 76: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Object tracking FCNT Results

76Wang Lijun Wanli Ouyang Xiaogang Wang and Huchuan Lu Visual Tracking with Fully Convolutional Networks In Proceedings of the IEEE International Conference on Computer Vision pp 3119-3127 2015 [code]

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 77: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Software

Caffe httpcaffeberkeleyvisionorg

Torch (Overfeat) httptorchch

Theano httpdeeplearningnetsoftwaretheano

Tensor Flow httpswwwtensorfloworg

MatconvNet (VLFeat) httpwwwvlfeatorgmatconvnet

CNTK (Mcrosoft) httpwwwcntkai 77

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 78: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Seminar Series Compacting ConvNets

for End to End Learning

Tuesday February 2 4pm

D5-010 Campus Nord

ConvNets Learn more

78

Jose M Aacutelvarez

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 79: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Stanford course CS231n

Convolutional Neural Networks for Visual

Recognition

ConvNets Learn more

79

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 80: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

Online course

Deep Learning

Taking machine learning to the next

level

80

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 81: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ReadCV seminarFriendly reviews of SoA papers

Spring 2016

Tuesdays at 11am

ConvNets Learn more

81

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 82: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Barcelona Convolucionada

Deep Learning a lrsquoabast de tothom

Monday February 1 7pm FIB Campus Nord UPC

ConvNets Learn more

82

Grup drsquoestudi de machine learning Barcelona

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 83: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Summer course

Deep Learning for Computer Vision

(25 ECTS for MSc amp Phd)

July 4-8 3-7pm

ConvNets Learn more

83

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 84: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Deep learning methos for vision (CVPR 2012)

Tutorial on deep learning for vision (CVPR 2014)

Kyunghyun Cho ldquoDeep Learning Past Present amp Futurerdquo

ConvNets Learn more

84

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 85: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

85

ldquoMachine learningrdquo sub-Reddit

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 86: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

86

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 87: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

87

Check profile requirements for Summer internship (disclaimer offered to Phd students by default)

Company Avg Salary hour Avg Salary month

Yahoo $43 ($43x160=$6880)

Apple $37 ($37x160=$5920)

Google $2954-$3132 $7151

Facebook $2292 $6150-$7378

Microsoft $2263 $6506-$7171

Source Glassdoorcom (internships in California No stipends included)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 88: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

88

Video Cristian Cantonrsquos talk ldquoFrom Catalonia to America notes on how to achieve a successful post-Phd career rdquo ACMCV 2015 amp UPC

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 89: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Li Fei-Fei ldquoHow wersquore teaching computers to understand picturesrdquo TEDTalks 2014

ConvNets Learn more

89

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 90: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Jeremy Howard ldquoThe wonderful and terrifying implications of computers that can learnrdquo TEDTalks 2014

ConvNets Learn more

90

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 91: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Learn more

91

Neil Lawrence OpenAI wonrsquot benefit humanity without open data sharing (The Guardian 14122015)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 92: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Is Computer Vision solved

ConvNets Discussion

92

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 93: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Sports Do you know them

93

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 94: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

ConvNets Do you know them

94

Antonio Torralba MIT(former UPC)

and MANY MORE I am missing in the page (apologies)

Oriol Vinyals Google(former UPC)

Jose M Aacutelvarez NICTA(former URL amp UAB)

Joan Bruna Berkeley(former UPC)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 95: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

95

ConvNets Where you are studyingVisioCat dinner CVPR 2015

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 96: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Considering a Phd at GPI-UPC Currently no direct funding available (check in the future)We can support your application to scholarships

External grant listings UPC UPF

Funding institution Last deadlines (on 2812016)

FI (Catalonia) 22092015

FPU (Spain) 15012016

Check our activity at httpsimatgeupceduweb 96

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 97: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Image Classification

97

Our past research

A Salvador Zeppelzauer M Manchon-Vizuete D Calafell-Oroacutes A and Giroacute-i-Nieto X ldquoCultural Event Recognition with Visual ConvNets and Temporal Modelsrdquo in CVPR ChaLearn Looking at People Workshop 2015 2015 [slides]

ChaLearn Worshop

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 98: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Saliency Prediction

J Pan and Giroacute-i-Nieto X ldquoEnd-to-end Convolutional Network for Saliency Predictionrdquo in Large-scale Scene Understanding Challenge (LSUN) at CVPR Workshops Boston MA (USA) 2015 [Slides] 98

Our current research

LSUN Challenge

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 99: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Sentiment Analysis

99

Our current research

[Slides]

CNN

V Campos Salvador A Jou B and Giroacute-i-Nieto X ldquoDiving Deep into Sentiment Understanding Fine-tuned CNNs for Visual Sentiment Predictionrdquo in 1st International Workshop on Affect and Sentiment in Multimedia Brisbane Australia 2015

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 100: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Our current research

Instance Search in Video

100

V - T Nguyen -Dinh-Le D Salvador A -Zhu C Nguyen D - L Tran M - T Duc T Ngo Duong D Anh Satoh S ichi and Giroacute-i-Nieto X ldquoNII-HITACHI-UIT at TRECVID 2015 Instance Searchrdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

K McGuinness Mohedano E Salvador A Zhang Z X Marsden M Wang P Jargalsaikhan I Antony J Giroacute-i-Nieto X Satoh S ichi OConnor N and Smeaton A F ldquoInsight DCU at TRECVID 2015rdquo in TRECVID 2015 Workshop Gaithersburg MD USA 2015

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101

Page 101: Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Thank you

Slides available on and

httpsimatgeupceduwebpeoplexavier-giro

httpbitsearchblogspotcom

httpstwittercomDocXavi

httpswwwfacebookcomProfessorXavi

xaviergiroupcedu

101