learning a nonlinear embedding by preserving class neibourhood structure 최종

Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure

AISTATS `07 San Juan, Puerto RicoSalakhutdinov Ruslan, and Geoffrey E. Hinton.

Presenter:

WooSung Choi([email protected])

DataKnow. LabKorea UNIV.

mailto:[email protected]

Background

(k-) Nearest Neighbor Query

kNN(k-Nearest Neighbor) Query

0 x

y

kNN(k-Nearest Neighbor) Classifi-cation

Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood

structure." International Conference on Artificial Intelligence and Sta-tistics. 2007.

NN Class1-NN 6

2-NN 6

3-NN 6

4-NN 6

5-NN 0

<Result of 5-NN>

Result of 5-NN Classification: 6 (80%)

Motivating Example• MNIST

Dimensionality: 28 x 28 = 78450,000 training images10,000 test images

• Error: 2.77%• Query response: 108ms

Reality Check• Curse of dimensionality

[Qin lv et al, Image Similarity Search with Compact Data Structures @CIKM`04]

poor performance when the number of dimensions is high

Roger Weber et al, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces @ VLDB`98

Locality Sensitive Hashing, Data Sensitive Hashing

Curse of Dimension-ality

Recall 데이터 분포 고려 기반 기술Scan X (없음 ) 1 △ N/A

RTree-based Solution O (강함 ) 1 O index: TreeLocality Sensitive

Hashing △ (덜함 ) X Hashing + Mathematics

Data Sensitive Hash-ing △ (덜함 ) O

Hashing+ Machine Learning

Abstract

Abstract• How to pre-train and fine-tune a MNN

To lean a nonlinear transformation From the input space To a low dimensional feature space

Where KNN classification performs well

Improved using unlabeled data

Introduction

Notation• Transformation to Low-Dim Feature Space

Input vectors: Transformation Function

Parameterized by Output vectors:

• Similarity Measure Input vectors: Output:

Objective (informal)

• Goal

Objective (formal)

• Goal: Maximizing

Relative Works: Linear Transforma-tion• Linear Transformation [8,9,18]

Weakness Limited number of parameters

, then should be 30 by 784 matrix(23,520 parameters) In this paper: 785*500 + 501*500 + 501*500 + 2001*30 parameters

Cannot model higher-order correlation

• Deep Autoencoder [14], DBN[12]

In this paper• Non-Linear Transformation

Overview Pre-training: Similar to [12,14]

Stack of RBM RBM1 784-500 RBM2 500-500 RBM3 500-2000 RBM4 2000-30

Fine-tuning: backpropagation To maximize the objective function

Maximize the expected number of correctly classified points

on the training data

Objective (formal)

• Goal: Maximizing

2. Learning Nonlinear NCA

Neighbourhood Component Analysis

Notation

Symbol DefinitionIndex

training vector (d-di-mensional data)

{1,2,…,C} Label of training vectorLabeled training casesOutput of Multilayer

Neural network parame-terized by

Euclidean distance met-ric

The probability thatpoint a selects one of its neighbor b in the trans-

formed feature space

0 1 5 7 7

1 0.3678

0.0497

0.002 0.002

0 0.88 0.11 0 0

𝑝𝑎𝑏=0.3678

0.3678+0.0497+0.0002+0.0002 ≈0.88

Notation

Symbol Definition

The probability thatpoint a selects one of its neighbor b in the transformed feature

space

The probability that point a belongs to

class k

The Expected Num-ber of correctly clas-

sified point on the training data

N/A 3 3 2 1

0 0.88 0.11 0 0

𝑝 (𝑐𝑎=3 )=0.99𝑝 (𝑐𝑎=2 )=0𝑝 (𝑐𝑎=1 )=0

Learning Rule• Backpropagation To maximize

• Derivation

𝜕𝜕 𝑓 (𝑥𝑎∨𝑊 )

¿


• Derivation

Standard backpropagationOutput Layer: Inner Layer:


• Derivation

Details• Pre-training

Mini-batch Each containing 100cases Epoch: 50

Fine-Tuning Method: Conjugate gradients on larger

mini-batches of 5,000 with three line search performed for each mini-batch

Epoch: 50

Dataset 60,000 training images 10,000 for validation

Experiment

Result

Appendix

Regularized Nonlinear NCA

Regularized Nonlinear NCA

Application

• Learn Compact binary codes that allow efficient re-trieval Gist descriptor + Locality Sensitive Hashing Scheme + Non-linear NCA

Dataset: LabelMe 22,000 images label: {human, woman, man, etc}

Torralba, Antonio, Rob Fergus, and Yair Weiss. "Small codes and large image data-bases for recognition." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

http://labelme2.csail.mit.edu/Release3.0/browserTools/php/publications.php

Neural Network

Toy Example: AND gate, XOR gate

AND gate

Z

𝑥

𝑦

1

x y t

𝑤0

𝑤1

𝑤2

sigm

𝑠𝑖𝑔𝑧

𝑧=𝑥 ∙𝑤0+ 𝑦 ∙𝑤1+1 ∙𝑤2

𝑠𝑖𝑔𝑚 ( 𝑥 )= 11+𝑒−𝑥

XOR gate

x y t

𝑧1𝑥𝑦1

𝑤0 0

𝑤01

𝑤0 2

sigm 𝑠𝑖𝑔𝑧1

𝑧 2𝑥𝑦1

𝑤10

𝑤11

𝑤12

sigm 𝑠𝑖𝑔𝑧 2

𝑧 3

1

𝑤20

𝑤21

𝑤22

sigm

𝑠𝑖𝑔 𝑧 3

Implementations• Toy example: Training Algorithm for logic gate• NLNCA for MNIST

https://github.com/ws-choi/Neural_Network_AND_XOR_Gate/blob/master/src/XOR_Gate_NN.java

http://wiki.dataknow.net/ws%2020151216%20Nonlinear%20NCA%20Deep%20Learning

learning a nonlinear embedding by preserving class neibourhood structure 최종

Engineering