get compact representation using deep networks

Get Compact Representation using Deep NetworksMethod and Application

Zhengbo Li

Shanghai Jiao Tong University

[email protected]

November 19, 2015

Zhengbo Li (SJTU) Get Compact Representation November 19, 2015 1 / 13

Overview

Motivation

Method

Performance

Application

Future work


Motivation: why do we need compact representation?

Useful — compact representation of original data needs lesscomputational and spacial resources.

Interesting — we want to know what are the compact representations(essentially the same as what do gates learn).


Dataset

A low resolution version of MNIST.

Convert 28 by 28 pictures to 14 by 14 pictures, each input is a 196dimension vector.

Due to limited computational resource and time.


Method: Autoencoder

Dilemma:

Shallow autoencoders (single or a few hidden layers):Advantage: easy to find a good local minimumDisadvantage: not complex enough to get good representations

Deep autoencoders (more hidden layers):Advantage: complex enough, good representation is possibleDisadvantage: very likely to get stuck into poor local minimums


Method: Combine the Advantages

Example: get a 4-dimensional representation of the 196 dimensionalhand written digits, aka, use 4 real numbers to represent a picture.Step 1: Use the 196 dimensional original input to train a 100dimensional representation.Step 2: Use the 100 dimensional representation to train a 50dimensional representation.

Figure 1 : Step 1(left), Step 2(right)


Method: Combine the Advantages, cont

Step 3: Combine these two networks. Use the red and blue weightswe got as initial weights and continue training, thus we get a 50dimensional representation of the original 196 dimensional input.

Figure 2 : Step 3


Method: Combine the Advantages, cont.

Step 4: Use the 50 dimensional representation to train a 20dimensional representation.

Step 5: Combine the networks. Use the red’, blue’ and green weightswe got as initial weights and continue training, thus we get a 20dimensional representation of the original 196 dimensional input.

Figure 3 : Step 4(left), Step 5(right)


Method: Combine the Advantages, cont.

Keep inserting hidden layers in the middle to get more compactrepresentations.

Final network structure:[196, 100, 50, 20, 10, 4, 10, 20, 50, 100, 196]


Performance

To evaluate performance, cost = sum square of the differencesbetween input and output, averaged for all inputs

Method Cost

Top 4 principle components (SVD) 12.1284Single hidden layer autoencoder with 4 hidden gates 6.1094Autoencoder with same architecture, but train all layers together 10.0036Our method 2.2951

Table 1 : Cost comparison for different methods


Application: Generating samples

Dimension reduction has many applications, omitted here.

Pick up a random 4-dimensional vector. With high probability itcorresponds to a hand written digit.

Figure 4 : Generated Numbers


Future work

Try other datasets.

See what do these 4 hidden gates learn (why the 4 dimensionalrepresentation achieves low cost).

Why deep networks are easy to get stuck into poor local minimums?


Thank you for listening.


get compact representation using deep networks

Documents