representation learning and deep neural networksdidawiki.cli.di.unipi.it/.../5-deep-hand.pdf ·...
TRANSCRIPT
IntroductionDeep Learning
Applications
Representation Learning and Deep NeuralNetworks
Davide Bacciu
Dipartimento di InformaticaUniversità di [email protected]
Applied Brain Science - Computational Neuroscience (CNS)
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Learning Representations in the Brain
Sensory information is represented by neural activityResponse selectivity of individual neuronsDistribution of activation in neural populationSomething we have seen so far
Neural representation is hierarchicalBrain cortex processes information incrementallyIncreasing levels of abstraction
Representation Learning
How can we obtain articulated hierarchical representations ofinformation in computational models?
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Representation Learning - A Computational View
Learning the unknown causes underlying dataCauses explain the data we observeAre also the basic building blocks which, when combined,generate the data we observe
Something we have already seen with RBM
Try explaining input data v usingthe unknown causes hLearning is
Generative as causes canreconstruct the dataUnsupervised as interest is inrepresenting information ratherthan, e.g., predicting a class
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Representation Learning - A Classical View
Representation learning as density estimation: learn aprobability distribution for the data v that uses latent variables h
Learning of a Gaussian Mixture ModelData likelihood P(v|h)
Posterior P(h|v)
Several models in this classical view: PCA, ICA, factor analysis,clustering, ...
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Representation Learning - A Modern View
Learn representations (a.k.a. causes or features) that areHierarchical: representations at increasing levels ofabstractionDistributed: information is encoded by a multiplicity ofcausesShared among tasksSparse: enforcing neural selectivityCharacterized by simple dependencies: a simple (linear)combination of the representations should be sufficient togenerate data
Deep models follow this modern view of representation learning
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
What is Deep Learning?
Machine learning algorithms inspired by brain organization,based on learning multiple levels of representation andabstraction
Learning models with many layers trained layer-wiseBuild a hierarchical feature space through layeringReduce the need of supervised information
Unsupervised discovery of features in the internal layersFinal layer performs supervised step
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Learning Features
The traditional shallow way
The deep way
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Why Deep Learning?
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
No, Seriously.. Why Deep Learning?
Deep learning is THE hot topic nowRevolutionized performance in
Speech recognitionMachine vision
Now expanding to other topicsNatural languageReinforcement learningRobotics
Its origins and inspiration can be traced back to brain/cortexorganization
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Hierarchical RepresentationHMAX Model
Explaining the structure of the visual cortex in mammals
M. Riesenhuber, T. Poggio, Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2:1019-1025, 1999
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Sparse RepresentationNeuroscience
Sparse coding theory in brain: sensory information in the brainis represented by a relatively small number of simultaneouslyactive neurons out of a large population (B.A. Olshausen, D.J.Field, 1996)
IntroductionDeep Learning
Applications
Representation LearningDeep RepresentationsBio-Inspired Foundations
Sparse RepresentationMachine Learning
The machine learning perspective: a single layer networklearns better to generate a target output if the input has asparse representation (Willshaw and Dayan, 1990)
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Deep Networks
Deep architecture with multiple layers focused on learningsparse encoding of input dataUnsupervised training between layers to decompose theproblem into distributed sub problems with increasinglevels of abstractionDeep networks type
Deep Generative ModelsStacked Neural AutoencodersDeep convolutional neural networksRecurrent neural networksHybrids of the above
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Training Deep Networks
Problem with conventional backpropagation trainingStrongly relies on labeled training dataLearning does not scale well to multiple hidden layers
Greedy layer-wise trainingUnsupervised (internal) layer by layer representationlearning (pre-training)Supervised training of last layer (read-out) plus optionalfine-tuning
Key advantagesGive full learning focus to each layerExploit unlabeled data using supervised training only forfine tuning
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Deep Generative Models
Intuition - Train a network where hidden units h organized into ahierarchy extract a good representation data from visible units x
Architecture - A networkof stacked RestrictedBoltzmann Machines(RBM) plus a supervisedread-out layer for makingpredictions
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Deep Belief Networks (DBN)
Pretraining of the independent RBM byContrastive-Divergence, which are then unrolled to form thedeep architecture that is fine-tuned by error backpropagation
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Deep (Restricted) Boltzmann Machines
DBM are directed generative models⇒ To train a Deep RBMwe need a pre-training trick
Sampling h1 by averagingbottom-up and top-downcontribution
h1j ∼ σ(∑
i
Mijxi +∑
m
Mjmh2m)
For multiple layers
Apply modified training to firstand last layer
Halve the weights of innerlayers
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Neural Autoencoder Networks
Non-accretive associative memories for feature discovery anddata compression
E.g. linear hidden units with MSE perform PCA (guess why?)
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Sparse Autoencoders
Autoencoders using more features with a significant numberbeing 0 when encoding an input
Using regularization approaches to enforce sparsity
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Stacked Neural Autoencoders
Stack autoencoders with level-wise training as with DBM
Train each autoencoder in isolation and drop the decoding layerwhen training is completed
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Convolutional Neural Networks (CNNs) - LeNet
Very much inspired by the visual processing pipeline in thebrain
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition,Proceedings of the IEEE 86(11): 2278-2324, 1998
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Strategies to Control Model Complexity(Regularization)
Weight SharingExploit symmetry orstationarity assumption toreduce the number ofparametersConvolutional NN,recurrent and recursive NN
Dropout
Randomly disconnecthidden neurons duringtrainingNeed to scale hiddenweights at test time
IntroductionDeep Learning
Applications
IntroductionDeep Learning ArchitecturesPractical Things to Know
Stochastic Gradient Descent
Mini-batchesCompute stochastic gradient onsmall batches of data rather thanon single sampleStabler and quicker learning andfacilitates GPU computing as abyproduct
Momentum Method
∆w(t+1) = α∆w(t)−ε∂E∂w
(t+1)Changes the velocity of theweight particle instead ofthe positionMay quicken convergencedue to avoiding oscillations
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Document Encoding - Deep Belief Network
Finding sparse features for > 800K newswire stories
DBN document encoding Latent semantic analysisHinton, Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
DeepFace: a.k.a. beating humans at recognizingfaces
DeepFace recognition accuracy is ≈ 97.35%
23% better than previous resultsCNN with more than 120 million parameters
Taigman et al, Deepface: Closing the gap to human-level performance in faceverification, CVPR 2014
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Learning to Play Atari (I)
Integrating CNN with Reinforcement Learning
Mnih et al, Human-level control through deep reinforcement learning, Nature 2015
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Learning to Play Atari (II)
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Deep Learning and Robotics
Learning where to grasp objects with robotic manipulatorsDeep auto-encoder networkPredicts which grasping box works better
Lenz et al, Deep Learning for Detecting Robotic Grasps, IJRR 2014
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Deep Learning API
Caffe - DL framework for computer visionhttp://demo.caffe.berkeleyvision.org/
Theano - Python library with GPU supporthttp://deeplearning.net/software/theano/
Torch - Scientific computing environmenthttp://torch.ch/
MatConvNet - CNN in Matlab with GPU supporthttp://www.vlfeat.org/matconvnet/
CNTK - Microsoft NN and DL toolkithttp://www.cntk.ai/
Tensorflow - Google NN and DL libraryhttps://www.tensorflow.org/
Make sure you have a good GPU!
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Want to try something easy out?
Google Tensorflow Playgroundhttp://playground.tensorflow.org
Language learning and generationhttp://karpathy.github.io/2015/05/21/rnn-effectiveness/
Caffe image classification demohttp://demo.caffe.berkeleyvision.org/
Nvidia Digits deep learning GUIhttps://developer.nvidia.com/digits
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Where is DL heading?
Image and video understanding by integration with naturallanguage descriptions
Bringing in biological models of attentionNatural language understanding and generation
Emotion understandingMoving from sequential representation of text to parse trees
Will soon impact heavily in roboticsSensorimotor learningPolicy learning: deep reinforcement learningCloud robotics: deep learned knowledge available toconnected devicesOnboard GPU computingSelf-driving cars, drones, connected devices
IntroductionDeep Learning
Applications
Deep Learning ApplicationsConclusions
Take Home Messages
Deep learning is about learning features of complex dataFeatures ≡ hidden stochastic neuronsStrongly rooted in hierarchical information processing in thebrain
Stacking and level-wise trainingLet the deep network discover the features and then placeyour preferred learning model to perform your task
Breakthrough performance in several learningtasks/application areas
Complex spatio/temporal structured data
Do we really know what is going on with the encoding?How many layers?
•Requirements •System Center Configuration Manager Technical" href="https://vdocuments.mx/integrationwith-windows-update-for-businessdigiblogs3-eu-central-1-.html">Integrationwith Windows Update for Businessdigiblog.s3-eu-central-1.amazonaws.com/app/..." Deadline="02/10/2016 00:00" /> •Requirements •System Center Configuration Manager Technical