building high-level features using large-scale unsupervised learning anh nguyen, bay-yuan hsu cs290d...

Post on 13-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building high-level features using large-scale unsupervised learningAnh Nguyen, Bay-yuan HsuCS290D – Data Mining (Spring 2014)University of California, Santa BarbaraSlide adapted from Andrew Ng (Stanford), Nando de Freitas (UBC) 1

Agenda1. Motivation2. Approach

1. Sparse Deep Auto-encoder2. Local Receptive Field3. L2 Pooling4. Local contrast normalization5. Overall Model

3. Parallelism4. Evaluation5. Discussion 2

1. Motivation

3

Motivation

• Feature learning• Supervised learning

• Need large number of labeled data• Unsupervised learning

• Example: Build face detector without having labeled face images

• Building high-level features using unlabeled data.

4

Motivation

• Previous works• Auto encoder• Sparse coding

• Result: Only learns low level features• Reason: Computational constraints• Approach

• Dataset• Model• Computational resources 5

2. Approach

6

Sparse Deep Auto-encoder

• Auto-encoder• Neural network• Unsupervised learning• Back-propagation

7

Sparse Deep Auto-encoder (cnt’d)• Sparse Coding• Input: Images x(1), x(2) ... x(m) • Learn: Bases (features) f1, f2, ..., fk, so that each

input x can be approximately decomposed as: x=∑ajfj s.t. aj’s are mostly zero (“sparse”)

8

Sparse Deep Auto-encoder (cnt’d)

9

Sparse Deep Auto-encoder (cnt’d)• Sparse Coding• Regularizer

10

Sparse Deep Auto-encoder (cnt’d)• Sparse Deep Auto-encoder

• Multiple hidden layers to achieve particular characteristic in learning features

11

Local Receptive Field

• Definition: Each feature in the autoencoder can connect only to a small region of the lower layer

• Goal: • Learn feature efficiently• Parallelism

• Training on small image patches

12

L2 Pooling

• Goal: Robust to local distortion• Approach: Group similar features together to

achieve invariance

13

L2 Pooling

• Goal: Robust to local distortion • Approach: Group similar features together to

achieve invariance

14

L2 Pooling

• Goal: Robust to local distortion • Approach: Group similar features together to

achieve invariance

15

L2 Pooling

• Goal: Robust to local distortion • Approach: Group similar features together to

achieve invariance

16

Local Contrast Normalization

• Goal: Robust to variation in light intensity• Approach: Normalize contrast

17

Local Contrast Normalization

• Goal: Robust to variation in light intensity• Approach: Normalize contrast

18

Overall Model

• 3 layers• Simple: 18x18 px

• 8 neurons/patch• Complex: 5x5 px• LCN: 5x5 px

19

Overall Model

20

Overall Model

• Train:• Reconstruct input of

each layer• Optimization function

21

Overall Model

• Complex model?

22

3. Parallelism

23

Asynchronous SGD

Two recent lines of research in speeding up large learning problems:• Parallel/distributed computing• Online (and mini-batch) learning algorithms: stochastic gradient descent, perceptron, MIRA, stepwise EMHow can we bring together the benefits of parallel computing and online learning? 24

Asynchronous SGD

SGD: Stochastic Gradient Descent:• Choose an initial vector of parameters W and

learning rate α• Repeat until an approximate minimum is

obtained:• Randomly shuffle examples in the training set

25

26

27

28

Model Parallelism

• Weights divided according to locality of image and store on different machine

29

5. evaluation

30

Evaluation

• 10M Youtube unlabeled frames of size 200x200

• 1B parameters• 1000 machines• 16,000 cores

31

Experiment on Faces

• Test set• 37,000 images• 13,026 face images

• Best neuron

32

Experiment on Faces (cnt’d)

• Visualization• Top stimulus (images) for face neuron• Optimal stimulus for face neuron

33

Experiment on Faces (cnt’d)

• Invariances Properties

34

Experiment on Faces (cnt’d)

• Invariances Properties

35

Experiment on Cat/Human body• Test set

• Cat: 10,000 positive, 18,409 negative• Human body: 13,026 positive, 23,974 negative

• Accuracy

36

ImageNet classification

• Recognizing images• Dataset

• 20,000 categories• 14M images

• Accuracy• 15.8%• State of art: 9.3%

37

5. DISCUSSION

38

Discussion

• Deep learning• Unsupervised feature learning• Learning multiple layers of representation

• Increase accuracy: Invariance, contrast normalization

• Scalability

39

6. REFERENCES

40

References1. Quoc Le et al., “Building High-level Features using Large Scale Unsupervised

Learning”2. Nando de Freitas, “Deep Learning”, URL: https://www.youtube.com/watch?

v=g4ZmJJWR34Q3. Andrew Ng, “Sparse autoencoder”, URL:

http://www.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf

4. Andrew Ng, “Machine Learning and AI via Brain Simulations”, URL: https://forum.stanford.edu/events/2011slides/plenary/2011plenaryNg.pdf

5. Andrew Ng, “Deep Learning”, URL: http://www.ipam.ucla.edu/publications/gss2012/gss2012_10595.pdf

41

top related