deep learning for team image understanding - ipal › ... › aura_deeplearning_poster.pdf ·...

Please Write Here The Title of This Poster Please Write Here the Different Authors of this Poster

Image & Pervasive Access Lab CNRS UMI 2955 - Singapore www.ipal.cnrs.fr

Deep Learning for Image Understanding

Olivier Morère1, Julie Petta2, Jie Lin3, Vijay Chandrasekhar3, Antoine Veillard1

1Université Pierre et Marie Curie, 2Supélec, 3A-Star Institute for Infocomm Research

Team Web & Data

Science

Image Classification Video Summarization

Compact Image Representations for Image Similarity Search

Convolutional Neural Networks

orFisher Vector

Deep Convolutional Neural Network

Input Image

Training Phase 2: Fine-Tuning

Global Feature Extraction

8K-64Kdim.

Stacked Regularized RBMs

W1 W2 WL. . .

Training Phase 1: Unsupervised

W1 W2 WL. . .

Loss1 LossL

W1 W2 WL. . .

Deep Siamese Network

Trained DeepHash Model

Image DescriptorHashing(Testing)

W1 W2 WL

Compact Binary Hash

64-1K bits

Matching &non-matching

High-dimensionalImage Descriptor

Transfer model

Training

Testing

↵=1 ↵=00<↵<1

More subject-centric More scene-centric

��

� ��

��

� ��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

"��#��!!�

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

"��#��!!�

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

��

� ��

��

� ��

��

� ��

!!��

��

� ��

"��#��

� ��

��

� ��

$� $�

��%��"��

��%��&

��

� ��

$� $�

��%��"��

��%��

��%��"��

��%��

GoogLeNet [Szegedy et al., 2014]

[Simonyan & Zisserman, 2014] Oxford VGG

Input Image

FC-4096

nv-128

nv-256

nv-512

FC-4096

FC-1000

Softmax Loss

[Krizhevsky et al., 2012; Zeiler & Fergus, 2013] AlexNet / Clarifai

Input Image

Softmax Loss

ImageNet 2014 Challenge LIMITED RESOURCES •  NVIDIA GTX580 (1.5GB Memory) •  Two-Month Effort

OPTIMIZATION •  Multi-Crop Pooling •  Model Fusion

RESULTS

CNN MODEL 1

Multiple Crops

Pooling

Pooled Scores

CNN MODEL 2

CNN MODEL N

Model Fusion

Fused Scores

QUERY IMAGE 15.4%

Learning Multimodal Representations

Tunable Automatic Video Summaries

For each video, a compact and mul3modal subject-‐scene subspace is learnt from high-‐dimensional CNN descriptors using novel unsupervised deep learning methods.

The mul3modal representa3ons are used to automa3cally generate compact summaries from videos. Subject-‐scene centricity can be tuned with a single parameter.

DEEPHASH • Binary descriptors (hash) from images • Unsupervised and supervised deep learning pipelines • Application to image similarity search

RESULTS • Very compact binary descriptors in the 32-1024 bits range • State-of-the-art retrieval results on many publicly available datasets • Enabling similarity search from internet-scale databases

Automa3c image understanding with human-‐like accuracy is the new fron3er of ar3ficial intelligence research and deep learning neural nets are front-‐running the race. While striving to reach and maintain state-‐of-‐the-‐art performance in large-‐scale image classifica3on, the deep learning group at IPAL is also exploring how the deep image models can be used to push the limits in various other fields of applica3on such as image compression, similarity-‐based image search and automa3c video summariza3on. Feel free to approach us for demos!

Latent subjectspace

Latent scenespace

DCNN subject descriptor

DCNN scenedescriptor

RBM RBMSceneDCNN

SubjectDCNN

Regularize with scene

16 Layers138M parameters

8 Layers60M parameters

Regularize with subjects

deep learning for team image understanding - ipal › ... › aura_deeplearning_poster.pdf ·...

Documents

polarimetric convolutional network for polsar image

deep convolutional dictionary learning for image denoising

image-based roadway assessment using convolutional neural

food image recognition using deep convolutional …

fast image processing with fully-convolutional...

image classification, deep learning and convolutional

medical image segmentation using 3d convolutional neural

image classification based on convolutional denoising

hardnet: convolutional network for local image description

non-local color image denoising with convolutional neural...

comparing convolutional neural networking and image

image crowd counting using convolutional neural network...

convolutional image...

high-resolution image classification with convolutional

convolutional neural networks for no-reference image quality...

medical image retrieval using deep convolutional neural

parametrization of convolutional neural network for image

using convolutional neural networks for image...

image super-resolution using deep convolutional networks

using convolutional neural networks for image recognition