scene classification using convolutional neural networks - jayani withanawasam

Scene Classifica,on using Convolu,onal Neural Networks

Jayani Withanawasam

Outline

•  Computer vision as an AI problem •  Importance of scene classifica,on and its challenges

•  Tradi,onal machine learning vs. deep learning •  Convolu,onal Neural Networks (CNNs) •  Using Caffe for implemen,ng CNNs •  Important resources to proceed with…

2

Is this exercise familiar to you?

Scene understanding is a primary school task! 3

What do you see?

4 Photo credits: Kaushalya Madhawa

What computers see?

Source: hSp://www.cs.washington.edu/research/me,p/about/digital.html 5

Why should we understand visual data?

•  Billions of views are generated on YouTube on daily basis

•  In Facebook, hundreds of millions of photo uploads per day

Can humans manually process such large volumes of data generated at this rate to instantly find

useful insights?

6

Computer vision as an AI problem

•  Intelligent behavior of an agent requires the ability to effec,vely interact and manipulate their environment

•  Detailed understanding of the external environment is achieved using visual percep,on

•  Computer vision provides methods to analyze images to understand objects and scenes

7

Using the forest to see the trees! (Torralba et al.)

8

Source: Using the forest to see the trees: exploi,ng context for visual object recogni,on and localiza,on, Torralba et al.)

Scene classifica,on in computer vision

•  Main focused areas in computer vision – Computer graphics –  Image recogni,on

•  Image recogni,on is based on concepts related to ar,ficial intelligence and cogni,ve science

•  Scene classifica,on goes under image recogni,on.

•  Scene classifica,on problem differs from object recogni,on problem as a scene (context) is composed of mul,ple objects

9

Scene classifica,on in computer vision (Con,nued.)

10

Source: Srinivasa Narasimhan’s slide

In 1966, Marvin Minsky at MIT asked his undergraduate student Gerald Jay Sussman to

spend the summer linking a camera to a computer and gefng the computer to describe

what it saw. We now know the problem is slightly more difficult than that ;)

Szeliski 2009, Computer vision

11

Challenges of scene classifica,on

12

Source: Learning deep features for scene recogni,on using places database, Zhou et al

Scene classifica,on: then and now

Labeling segmenta,ons of the scene (part based models)

Analyzing the en,re scene as a whole and train

using the available large volumes of data

13

Deep Learning •  Tradi,onal machine learning algorithms, – Do not perform well in high dimensional space –  Requires expert knowledge to hand engineer features

– High computa,onal cost

•  Deep learning algorithms, –  Specialized form of ar,ficial neural network –  Representa,onal learning for high dimensional data

– Use of GPUs to accelerate learning

Inspired by nature…

15 Source: Hubel and Wiesel experiment

•  Local recep,ve fields •  Simple cells •  Complex cells

Convolu,onal Neural Networks (CNNs)

•  Deep learning technique to recognize spa,al paSerns of data

•  Hierarchical organiza,on of different abstrac,on levels of image features

•  Type of Ar,ficial Neural Network (ANN)

Assump,on: You are familiar with basic Ar,ficial Neural

Networks (ANN) and machine learning concepts

16

Historical CNN architectures

17

Source: Gradient-‐based learning applied to document recogni,on, LeCun et al, 1998

Source: Imagenet classifica,on with deep convolu,onal neural networks, Krizhevsky et al, 2012

CNN architecture

18

•  Convolu8on layers •  Sub-‐sampling (Pooling) layers •  Non-‐linearity layers (Ac,va,on func,on) •  Fully connected (FC) layer (op,onal)

Source: hSps://adeshpande3.github.io/adeshpande3.github.io/A-‐Beginner's-‐Guide-‐To-‐Understanding-‐Convolu,onal-‐Neural-‐Networks/

Important hyper parameters for CNN

•  Number of filters (kernals) •  Stride •  Size of the filter •  Amount of padding •  Other (not CNN specific) – Learning rate (and its decay) – Batch size – Momentum

19

Caffe for CNN implementa,on

•  Convolu,onal Architecture For Feature Extrac,on •  Deep learning framework by Berkley Vision and Learning center hSp://caffe.berkeleyvision.org/

•  Reference models in Caffe model Zoo •  Input (E.g., lmdb) •  Net: Layers (data, loss, convolu,on) E.g.,

lenet_train.prototxt

•  Solver (learning rate, net, model snapshots, valida,on) E.g., lenet_solver.prototxt

20

lenet_solver.prototxt

21

lenet_train.prototxt (few important layers)

22

Data layer

Pooling layer

Convolu,onal layer

MIT Places for scene recogni,on

•  MIT Places database •  Places2 Challenge •  MIT Scene Recogni,on Demo •  hSp://places.csail.mit.edu

23

Important resources

•  CS231n: Convolu,onal neural networks for visual recogni,on, Fei Fei Li, Andrej Karpathy, Jus,n Johnson, Stanford university. hSp://cs231n.stanford.edu/

•  DeepLearninbook, Ian Goodfellow, Yoshua Bengio, Aaron Courville. hSp://www.deeplearningbook.org/

24

We are not there yet…

Source: Concise Computer Vision

25

Contact me

•  Linkedin: hSps://www.linkedin.com/in/jayaniwithanawasam

•  Email: [email protected]

26

Thank you

27

scene classification using convolutional neural networks - jayani withanawasam

Technology