grasp learning a kernel matrix for nonlinear dimensionality reduction kilian q. weinberger, fei sha...

49
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer and Information Science

Upload: branden-robinson

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul

ICML’04

Department of Computer and Information Science

Page 2: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

The Big Picture

Given high dimensional data sampled from a low dimensional manifold,

how to compute a faithful embedding?

Page 3: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Outline

• Part I: kernel PCA

• Part II: Manifold Learning

• Part III: Algorithm • Part IV: Experimental Results

Page 4: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part I.Part I. kernel PCAkernel PCA

Page 5: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Nearby points remain nearby,distant points remain distant.

Estimate d.

Input:

Output:

Problem:

Embedding:

Page 6: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Subspaces

D=3d=2

D=2d=1

Page 7: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Principal Component Analysis

Project data into subspace of maximum variance:

Can be solved as eigenvalue problem:

Page 8: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Using the kernel trick

Do PCA in a higher dimensional feature space

Can be defined implicitly through

kernel matrix

Page 9: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

• Linear

• Gaussian

• Polynomial

Common Kernels

Do very well for classification.How about manifold learning?

Page 10: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Linear KernelQuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

Page 11: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Gaussian KernelsQuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

Page 12: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Gaussian KernelsQuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

Feature vectors span as many dimensions as number of spheres with radius needed to enclose input vectors.

Page 13: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Polynomial KernelsQuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

Page 14: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part II. Manifold Learningvia Semidefinite Programming

Page 15: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Local Isometry

A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.

Page 16: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Local Isometry

A smooth, invertible mapping that preserves distances and looks locally like a rotation plus translation.

Page 17: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Neighborhood graphConnect each point toits k nearest neighbors.

Discretized manifolds

Page 18: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Preserve local distancesApproximation of local isometry:

Constraint

Neighborhoodindicator

Page 19: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

• Goal:

• Problem:

• Heuristic:

Objective Function?

Find Minimum Rank Kernel Matrix

Computationally Hard

Maximize Pairwise Distances

Page 20: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Objective Function? (Cont’d)

What happens if we maximize the pairwise distances?

Page 21: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Semidefinite Programming Problem:

Maximize:

subject to: Preserve local neighborhoods

Unfold manifold

Center output

Semipositivedefinite

Page 22: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part IIISemidefinite Embedding

in three easy steps(Also known as “Maximum Variance Unfolding”

[Sun, Boyd, Xiao, Diaconis])

Page 23: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

1. Step: K-Nearest Neighbors

Compute nearest neighbors and the Gram matrix for each neighborhood

Page 24: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

2. Step: Semidefinite programming

Compute centered, locally isometric dot-product matrix with maximal trace

Page 25: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Estimate d from eigenvalue spectrum. Top eigenvectors give embedding

3. Step: kernel PCA

Page 26: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part IV. Experimental Results

Page 27: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Trefoil Knot

N=539k=4D=3d=2

Page 28: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Trefoil Knot

N=539k=4D=3d=2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

RB

F

Poly

nom

ial

Lin

ear

SDE

% V

aria

nce

Page 29: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Teapot (full rotation) N=400k=4

D=23028d=2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Bef

ore

Aft

er

% V

aria

nce

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 30: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

N=200k=4

D=23028d=2

Teapot (half rotation)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Bef

ore

Aft

er

% V

aria

nce

Page 31: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

FacesN=1000

k=4D=540

d=2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Bef

ore

Aft

er

% V

aria

nce

Page 32: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Twos vs. Threes

N=953k=3

D=256d=2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RBF

POLY

NOMIA

L

LINEA

RSD

E

Page 33: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part V.Supervised Experimental Results

Page 34: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Large Margin Classification

• SDE Kernel used in SVM

• Task: Binary Digit Classification

• Input: USPS Data Set

• Training / Testing set: 810/90

• Neighborhood Size: k=4

Page 35: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

0

2

4

6

8

10

12

1 vs 2 1 vs 3 2 vs 8 8 vs 9

LinearPolynomialGaussianSDE

SVM Kernel

SDE is not well-suited for SVMs

Page 36: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

SVM Kernel (cont’d)

Non-Linear decision boundaryLinear decision boundary

Unfolding does not necessarily help classification • Reducing the dimensionality is counter-intuitive.• Needs linear decision boundary on manifold.

Page 37: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Part VI. Conclusion

Page 38: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Previous Work

Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]

Page 39: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Previous Work (Isomap)

Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]

Matrix not necessarily semi-positive definite

SDE Isomap

Page 40: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Previous Work (Isomap)

Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]

Matrix not necessarily semi-positive definite

SDE Isomap

Page 41: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Previous Work (LLE)

Isomap and LLE can both be seen from a kernel view [Jihun Ham et al., ICML’04]

Eigenvalues do not reveal true dimensionality

SDE LLE

Page 42: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Conclusion

Semidefinite Embedding (SDE)+ extends kernel PCA to do manifold learning+ uses semidefinite programming+ has a guaranteed unique solution- not well suited for support vector machines- exact solution (so far) limited to N=2000

Page 43: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Page 44: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Semidefinite Programming Problem:

Maximize:

subject to: Preserve local neighborhoods

Unfold Manifold

Center Output

semi-positivedefinite

Page 45: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Semidefinite Programming Problem:

Maximize:

subject to: Preserve local neighborhoods

Unfold Manifold

Center Output

semi-positivedefinite

Introduce Slack

Page 46: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Swiss Roll

N=800k=4D=3d=2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Bef

ore

Aft

er

% V

aria

nce

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% V

aria

nce

Page 47: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Applications

• Visualization of Data

• Natural Language Processing

Page 48: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Trefoil Knot

N=539k=4D=3d=2

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

RB

F

Poly

nom

ial

Lin

ear

SDE

% V

aria

nce

RBFPolynomial

SDE

Page 49: GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer

GRASP

Motivation

• Similar vectorized pictures lie on a non-linear manifolds

• Linear Methods don’t work here