spatial data: dimensionality reduction · spatial data: dimensionality reduction cs444 techniques,...

20
Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3

Upload: others

Post on 01-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Spatial Data: Dimensionality Reduction

CS444Techniques, Lecture 3

Page 2: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

In this subfield, we think of a data point as a

vector in R^n

(what could possibly go wrong?)

Page 3: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

“Linear” dimensionality reduction:

Reduction is achieved by is a single matrix for

every point.

Page 4: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Regular Scatterplots• Every data point is a vector:

• Every scatterplot is produced by a very simple matrix:

2

664

v0v1v2v3

3

775

1 0 0 00 0 1 0

1 0 0 00 1 0 0

Page 5: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

What about other matrices?

Page 6: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Grand Tour (Asimov, 1985)

http://cscheid.github.io/lux/demos/tour/tour.html

Page 7: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Is there a best matrix?

How do we think about that?

Page 8: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Linear Algebra review• Vectors

• Inner Products

• Lengths

• Angles

• Bases

• Linear Transformations and Eigenvectors

Page 9: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Principal Component Analysis

Sepal.LengthSepal.Width

Petal.LengthPetal.Width

−0.2

−0.1

0.0

0.1

0.2

−0.10 −0.05 0.00 0.05 0.10 0.15PC1

PC2

Species

setosa

versicolor

virginica

Page 10: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Principal Component Analysis• Algorithm:

• Given data set as matrix X in R^(d x n),

• Center matrix:

• Compute eigendecomposition of

• The principal components are the first few rows of

X̃ = X(I �~1

n~1T ) = XH

X̃T X̃ = U⌃UT

X̃T X̃

U⌃1/2

Page 11: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

What if we don’t have coordinates, but distances?

“Classical” Multidimensional Scaling

Page 12: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

http://www.math.pku.edu.cn/teachers/yaoy/Fall2011/lecture11.pdf

Page 13: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Borg and Groenen, Modern Multidimensional Scaling

Page 14: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Borg and Groenen, Modern Multidimensional Scaling

Page 15: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

“Classical” Multidimensional Scaling

• Algorithm:

• Given , create

• PCA of B is equal to the PCA of X

• Huh?!

Dij = |Xi �Xj |2 B = �1

2HDHT

Page 16: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

“Nonlinear” dimensionality

reduction

(ie: projection is not a matrix operation)

Page 17: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

Data might have “high-order” structure

Page 18: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

http://isomap.stanford.edu/Supplemental_Fig.pdf

Page 19: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

We might want to minimize something else besides “difference

between squared distances”

t-SNE: difference between neighbor ordering

Why not distances?

Page 20: Spatial Data: Dimensionality Reduction · Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3. In this subfield, we think of a data point as a vector in R^n (what

The curse of Dimensionality

• High dimensional space looks nothing like low-dimensional space

• Most distances become meaningless