scene classification using convolutional neural networks - jayani withanawasam
TRANSCRIPT
Outline
• Computer vision as an AI problem • Importance of scene classifica,on and its challenges
• Tradi,onal machine learning vs. deep learning • Convolu,onal Neural Networks (CNNs) • Using Caffe for implemen,ng CNNs • Important resources to proceed with…
2
Why should we understand visual data?
• Billions of views are generated on YouTube on daily basis
• In Facebook, hundreds of millions of photo uploads per day
Can humans manually process such large volumes of data generated at this rate to instantly find
useful insights?
6
Computer vision as an AI problem
• Intelligent behavior of an agent requires the ability to effec,vely interact and manipulate their environment
• Detailed understanding of the external environment is achieved using visual percep,on
• Computer vision provides methods to analyze images to understand objects and scenes
7
Using the forest to see the trees! (Torralba et al.)
8
Source: Using the forest to see the trees: exploi,ng context for visual object recogni,on and localiza,on, Torralba et al.)
Scene classifica,on in computer vision
• Main focused areas in computer vision – Computer graphics – Image recogni,on
• Image recogni,on is based on concepts related to ar,ficial intelligence and cogni,ve science
• Scene classifica,on goes under image recogni,on.
• Scene classifica,on problem differs from object recogni,on problem as a scene (context) is composed of mul,ple objects
9
In 1966, Marvin Minsky at MIT asked his undergraduate student Gerald Jay Sussman to
spend the summer linking a camera to a computer and gefng the computer to describe
what it saw. We now know the problem is slightly more difficult than that ;)
Szeliski 2009, Computer vision
11
Challenges of scene classifica,on
12
Source: Learning deep features for scene recogni,on using places database, Zhou et al
Scene classifica,on: then and now
Labeling segmenta,ons of the scene (part based models)
Analyzing the en,re scene as a whole and train
using the available large volumes of data
13
Deep Learning • Tradi,onal machine learning algorithms, – Do not perform well in high dimensional space – Requires expert knowledge to hand engineer features
– High computa,onal cost
• Deep learning algorithms, – Specialized form of ar,ficial neural network – Representa,onal learning for high dimensional data
– Use of GPUs to accelerate learning
Inspired by nature…
15 Source: Hubel and Wiesel experiment
• Local recep,ve fields • Simple cells • Complex cells
Convolu,onal Neural Networks (CNNs)
• Deep learning technique to recognize spa,al paSerns of data
• Hierarchical organiza,on of different abstrac,on levels of image features
• Type of Ar,ficial Neural Network (ANN)
Assump,on: You are familiar with basic Ar,ficial Neural
Networks (ANN) and machine learning concepts
16
Historical CNN architectures
17
Source: Gradient-‐based learning applied to document recogni,on, LeCun et al, 1998
Source: Imagenet classifica,on with deep convolu,onal neural networks, Krizhevsky et al, 2012
CNN architecture
18
• Convolu8on layers • Sub-‐sampling (Pooling) layers • Non-‐linearity layers (Ac,va,on func,on) • Fully connected (FC) layer (op,onal)
Source: hSps://adeshpande3.github.io/adeshpande3.github.io/A-‐Beginner's-‐Guide-‐To-‐Understanding-‐Convolu,onal-‐Neural-‐Networks/
Important hyper parameters for CNN
• Number of filters (kernals) • Stride • Size of the filter • Amount of padding • Other (not CNN specific) – Learning rate (and its decay) – Batch size – Momentum
19
Caffe for CNN implementa,on
• Convolu,onal Architecture For Feature Extrac,on • Deep learning framework by Berkley Vision and Learning center hSp://caffe.berkeleyvision.org/
• Reference models in Caffe model Zoo • Input (E.g., lmdb) • Net: Layers (data, loss, convolu,on) E.g.,
lenet_train.prototxt
• Solver (learning rate, net, model snapshots, valida,on) E.g., lenet_solver.prototxt
20
MIT Places for scene recogni,on
• MIT Places database • Places2 Challenge • MIT Scene Recogni,on Demo • hSp://places.csail.mit.edu
23
Important resources
• CS231n: Convolu,onal neural networks for visual recogni,on, Fei Fei Li, Andrej Karpathy, Jus,n Johnson, Stanford university. hSp://cs231n.stanford.edu/
• DeepLearninbook, Ian Goodfellow, Yoshua Bengio, Aaron Courville. hSp://www.deeplearningbook.org/
24