Deep Learning for Data Mining
University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti
Valsamis Ntouskos, ALCOR Lab
Outline
Deep Learning for Data Mining
• Introduction - Motivation
• Theoretical aspects
• Convolutional Neural Networks
• Recurrent Neural Networks
• Generative models
• Further directions
13/12/2018
Deep Learning with CNNs
Deep Learning for Data Mining
Compositional Models
Learned End-to-End
Hierarchy of Representations- vision: pixel, motif, part, object
- text: character, word, clause, sentence
- speech: audio, band, phone, word
concrete abstractlearning
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Deep Learning with CNNs
Deep Learning for Data Mining
Compositional Models
Learned End-to-End
Back-propagation jointly learns
all of the model parameters to
optimize the output for the task.
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Motivation of CNNs
Deep Learning for Data Mining
Inputs usually treated as general feature vectors
In some cases inputs have special structure:• Audio• Images• Videos
Signals: Numerical representations of physical quantities
Deep learning can be directly applied on signals by using suitable operators
13/12/2018
Motivation
Deep Learning for Data Mining
. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .
1D data - (variable length) vectors
Audio
13/12/2018
Motivation
Deep Learning for Data Mining
Images
A sequence of images sampled through time - 3D data
2D data - matrices
Video
13/12/2018
What is a CNN?
Deep Learning for Data Mining 13/12/2018
Some theory
13/12/2018Deep Learning for Data Mining
Convolution
• Image filtering is
based on convolution
with special kernels
Some theory
Deep Learning for Data Mining
Pooling
• Introduces subsampling
13/12/2018
Some theory
Deep Learning for Data Mining
Activation
Standard way to model a neuron
f(x) = tanh(x) or f(x) = (1 + e-x)-1
Very slow to train (saturation)
Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train
13/12/2018
Some theory
Deep Learning for Data Mining
Regularization
Dropout
• Applied on the fully-connected layers
• During training prune nodes with probability α
• During testing nodes are weighed by α
Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
13/12/2018
Some theory
Deep Learning for Data Mining
Every convolutional layer of a CNN transforms the 3D input
volume to a 3D output volume of neuron activations.
A regular 3-layer Neural Network
Material from Fei-Fei’s group
13/12/2018
Some theory
Deep Learning for Data Mining
Each neuron is connected to a
local region in the input volume
spatially, but to all channels
The neurons still compute a dot
product of their weights with the
input followed by a non-linearity
Material from Fei-Fei’s group
13/12/2018
Algorithms
Deep Learning for Data Mining
• Each* neuron/layer is differentiable!
• Just apply backpropagation (chain-rule)
• Use standard gradient-based optimization algorithms
(SGD, AdaGrad, …)
• The devil lies in the details though …
▪Choosing hyperparameters / loss-function
▪Exploding/Vanishing gradients – batch normalization
▪Overfitting – Regularization
▪Cost of performing experiments
▪Convergence
▪…
*what about max-pooling?
13/12/2018
Kernels and
Feature maps
Deep Learning for Data Mining
Material from Fei-Fei’s group
13/12/2018
Case study: image classification
Deep Learning for Data Mining
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Deep Learning for Data Mining
Case study: image classification
Slides from Caffe framework tutorial @ CVPR2015
13/12/2018
Brief history of CNNs
Foundational work done in the middle of the 1900s
• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,
Hebb 1949, Rosenblatt 1958]
• 1980s-mid 1990s: Connectionism [Rumelhart 1986,
Hinton 1989]
• 1990s: modern convolutional networks [LeCun et al.
1998], LSTM [Hochreiter & Schmidhuber 1997,
MNIST and other large datasets]
Deep Learning for Data Mining 13/12/2018
Brief history of CNNs
Deep Learning for Data Mining
Hubel & Wiesel [60s] Simple & Complex cells architecture:
Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]
Yann LeCun’s Early CNNs [80s]:
13/12/2018
Brief history of CNNs
Deep Learning for Data Mining
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
13/12/2018
Recent success
• Parallel Computation (GPU)
• Larger training sets
• International Competitions
• Theoretical advancements
– Dropout
– ReLUs
– Batch Normalization
Deep Learning for Data Mining 13/12/2018
Recent success
Deep Learning for Data Mining
CUDA Jetson TX1, TK1
Android lib, demo
OpenCL branch
Better Hardware – GPUs
13/12/2018
Recent success
ImageNet
• Over 15M labeled high resolution images
• Roughly 22K categories
• Collected from web and labeled by Amazon Mechanical Turk
Deep Learning for Data Mining
Larger training sets
13/12/2018
Recent success
ILSVRC
• Annual competition of image classification at large scale
• 1.2M images in 1K categories
• Classification: make 5 guesses about the image label
Deep Learning for Data Mining
Competitions
EntleBucher Appenzeller
13/12/2018
CNNs in Computer Vision
Deep Learning for Data Mining
• Image classification
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
AlexNet: a layered model composed of convolution, subsampling, and further operations
followed by a holistic representation and all-in-all a landmark classifier on
ILSVRC12. [ AlexNet ]
Convolutional Nets: 2012
AlexNet
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- GoogLeNet: composition of multi-scale
dimension-reduced modules
+ depth
+ data
+ dimensionality reduction
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- VGG: 16 layers of 3x3 convolution
interleaved with max pooling +
3 fully-connected layers
+ depth
+ data
+ dimensionality reduction
13/12/2018
Evolution of CNNs for image classification
Deep Learning for Data Mining
Convolutional Nets: 2015
ResNet
ILSVRC15 Winner: ~3.6% Top-5 error
Intuition: Easier to learn zero than identity function
13/12/2018
Reasonable questions
• Is this just for a particular dataset? – No!
Deep Learning for Data Mining
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Reasonable questions
Deep Learning for Data Mining
Object Localization[R-CNN, HyperColumns, Overfeat, etc.]
Pose estimation [Thomson et al, CVPR’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Reasonable questions
Deep Learning for Data Mining
Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
13/12/2018
Fine Tuning
Deep Learning for Data Mining
Dogs vs. Cats
top 10 in 10 minutes
Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]
© kaggle.com
Your Task
Style
RecognitionLots of Data
ImageNet
13/12/2018
RNNs
Dealing with sequences
• data points are related
• order is important!
(C)NNs disregard this information
Deep Learning for Data Mining 13/12/2018
RNNs
Recurrent Neural Networks (RNNs) address this problem by introducing cells with loops
13/12/2018Deep Learning for Data Mining
• allow information to persist
• the value of output 𝐡𝑡 depends both on 𝐱𝑡 and the previous output 𝐡𝑡−1
Material from Colah’s blog
RNNs
Another way to see RNNs is to unfold the loop
• each cell dedicated to a different data sample
• output 𝐡𝑡 computed based also on 𝐡𝑡−1
13/12/2018Deep Learning for Data Mining
RNNs
RNN structure
13/12/2018Deep Learning for Data Mining
Material from Colah’s blog
RNNs
RNNs (and variants) are very effective in various domains:
• Speech recognition
• Translation
• Image/Video captioning
• Video summarization
• Activity recognition
• ...
13/12/2018Deep Learning for Data Mining
RNNs
a) "The clouds are in the _."
13/12/2018Deep Learning for Data Mining
ProblemRNNs cannot capture long-term dependencies• so-called vanishing/exploding gradients
problem
b) "I grew up in France... I speak fluently _."
LSTMs
Long Short Term Memories (LSTMs)
• RNNs with special structure
• designed to capture long-term dependencies
13/12/2018Deep Learning for Data Mining
Hochreiter & Schmidhuber (1997) Long short-term memory
LSTMs
Main idea: Separate cell output 𝐡𝑡 and cell state 𝐶𝑡• state is modified through a series of
structures called gates
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
1st gate: forget mechanism
• drop elements of 𝐶𝑡−1 based on values of 𝐡𝑡−1and input 𝐱𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
2nd and 3rd gate: update mechanism
i. decide which elements of 𝐶𝑡−1 to update
ii. compute the update vector ሚ𝐶𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
2nd and 3rd gate: update mechanism
iii. update elements selected via 𝑖𝑡 with values computed in ሚ𝐶𝑡
13/12/2018Deep Learning for Data Mining
LSTMs – step by step
Last step: compute output
• Based on 𝐶𝑡 , 𝐡𝑡−1, 𝐱𝑡
13/12/2018Deep Learning for Data Mining
LSTM variants 1/3
Add peephole connections
All layers “see” the state 𝐶𝑡−1
13/12/2018Deep Learning for Data Mining
Gers & Schmidhuber (2000) Recurrent nets that time and count
LSTM variants 2/3
Coupled input and forget gates:
• only forget something if you are going to input something else
13/12/2018Deep Learning for Data Mining
LSTM variants 3/3
Gated Recurrent Units (GRUs)
• no cell state! - directly operate on output 𝐡𝑡• just two gates instead of three
13/12/2018Deep Learning for Data Mining
Yao et al. (2015) Depth-gated recurrent neural networks
LSTMs
Further improvements:
• Further modifications of gates/structure
• Bi-directional LSTMs
• Attention mechanisms
• ...
13/12/2018Deep Learning for Data Mining
LSTMs – typical use cases
One-to-one – classical NNs (no RNN)
One-to-many – sequence output (e.g. image captioning)
Many-to-one – sequence input (e.g. sentiment analysis)
Many-to-many – sequence to sequence (e.g. machine translation)
Many-to-many synced – e.g. frame per frame video analysis
13/12/2018Deep Learning for Data Mining
LSTMs – use case examples
Deep Learning for Data Mining
Visual Sequence Tasks
Jeff Donahue et al. CVPR’15
68
Based on Long short-term memory (LSTM) layers
13/12/2018
Using LSTMs
Some videos:
- Facebook AI
- Music classification
- Action recognition SecondHands project
13/12/2018Deep Learning for Data Mining
Beyond Regression/Classification
Generative models
• Variational Auto-Encoders (VAEs)
– focus on learning latent space structure
• Generative Adversarial Networks (GANs)
– focus on learning a distribution
– no latent space (in general)
13/12/2018Deep Learning for Data Mining
GANs
Goal
Sample from the input data distribution 𝒳
Idea
Invert a (Convolutional) Neural Network
13/12/2018Deep Learning for Data Mining
GANs
Encoder (conventional CNN)
• processes an image and produces a vector/code
13/12/2018Deep Learning for Data Mining
GANs
Decoder
• receives a code and produces an image
• uses “deconvolutional” layers
13/12/2018Deep Learning for Data Mining
Material from Frans’ blog
GANs
Problem
How to train the decoder to produce meaningful data?
Idea
Adversarial Training
13/12/2018Deep Learning for Data Mining
GANs
GAN is a combination of two networks
1. a generator network (decoder)
2. a discriminator network (critic)
13/12/2018Deep Learning for Data Mining
GANs
Roles
• Generator: produces realistic samples of 𝒳
• Discriminator: identifies if a sample actually comes from the (unknown) 𝒳 or not
Training
make the networks compete with each other
• generator tries to fool the discriminator in believing that the sample is ‘real’
• discriminator tries to discriminate as good as possible ‘real’ from ‘fake’ samples
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Are these real images or generated?
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Are these real images or generated?
13/12/2018Deep Learning for Data Mining
GANs
Example on CIFAR dataset (32x32 images)
Results at 300, 900 and 5700 iterations
13/12/2018Deep Learning for Data Mining
GANs
Example: Celebrity faces (1024x1024 images)
13/12/2018Deep Learning for Data Mining
VAEs
Goal
• modify data in specific directions
• identify meaningful directions in latent space
Examples
- ‘distort’ faces (change expression, add glasses)
- produce digits from different hand-writing styles
- distort 3D meshes
- ... 13/12/2018Deep Learning for Data Mining
VAEs
What is an auto-encoder?
- a combination of an encoder and a decoder
- trained based on reconstruction loss
- provides low-dimensional representation
13/12/2018Deep Learning for Data Mining
Material from Kurita’s blog
VAEs
Goal
Feed vectors and get realistic samples of 𝒳
AE problem: we don’t know if the vectors are valid or not
13/12/2018Deep Learning for Data Mining
VAEs
Main idea:
Encoder produces a distribution instead of a vector
Decoder operates on samples from this distribution
13/12/2018Deep Learning for Data Mining
VAEs
Problems
1. How to produce a distribution?
2. How to prevent degeneration?
Solutions
1. parametric distributions – typically Gaussian
– produce mean 𝜇 and variance Σ
2. add loss term based on Kullback-Leiblerdivergence
13/12/2018Deep Learning for Data Mining
VAEs
One more problem:
- Sampling operation is not differentiable
Solution:
- re-parametrization
13/12/2018Deep Learning for Data Mining
Images from Keita Kurita’s blog
VAEs
Example: faces (Hou et al., 2016)
13/12/2018Deep Learning for Data Mining
VAEs
Example: 3D Mesh deformation (Tan et al., 2018)
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
13/12/2018Deep Learning for Data Mining
One –show learning example: Mullet Zero–show learning example: Zebra (from horse)
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
• learning functions defined on graphs
13/12/2018Deep Learning for Data Mining
Further directions
Few-shot and zero-shot learning
• meta-learner architectures
• Prototypical networks
Learning on manifolds
• learning functions defined on surfaces
• learning functions defined on graphs
Learning with constraints
• Frank-Wolf networks
13/12/2018Deep Learning for Data Mining
Resources
Frameworks:
• TensorFlow (Google) | C/C++, Python, Java, Go
• Torch/PyTorch (Facebook) | Lua/Python
• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab
• Theano (U Montreal) | Python
• CNTK (Microsoft) | Python, C++ , C#/.Net, Java
• MxNet (DMLC) | Python, C++, R, Perl, …
• Darknet (Redmon J.) | C
• …
Deep Learning for Data Mining 13/12/2018
Resources
High-level libraries:
• Keras | Backends: TensorFlow (TF), Theano
Models:
• Depends on the framework, e.g.
– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)
– https://github.com/tensorflow/models/tree/master/research (TF)
Interactive Interfaces:
• DIGITS (NVIDIA) | Caffe, TF, Torch
• TensorBoard (TF)
Tools:
• http://ethereon.github.io/netscope (for networks defined in protobuf )
Deep Learning for Data Mining 13/12/2018
Thank you!
13/12/2018Deep Learning for Data Mining