mlcc 2018 dimensionality reduction and...

54
MLCC 2018 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT

Upload: others

Post on 08-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

MLCC 2018Dimensionality Reduction and PCA

Lorenzo RosascoUNIGE-MIT-IIT

Page 2: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 2

Page 3: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction

In many practical applications it is of interest to reduce thedimensionality of the data:

I data visualization

I data exploration: for investigating the ”effective” dimensionality ofthe data

MLCC 2018 3

Page 4: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction (cont.)

This problem of dimensionality reduction can be seen as the problem ofdefining a map

M : X = RD → Rk, k � D,

according to some suitable criterion.

In the following data reconstruction will be our guiding principle.

MLCC 2018 4

Page 5: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction (cont.)

This problem of dimensionality reduction can be seen as the problem ofdefining a map

M : X = RD → Rk, k � D,

according to some suitable criterion.

In the following data reconstruction will be our guiding principle.

MLCC 2018 5

Page 6: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Principal Component Analysis

PCA is arguably the most popular dimensionality reduction procedure.

It is a data driven procedure that given an unsupervised sample

S = (x1, . . . , xn)

derive a dimensionality reduction defined by a linear map M .

PCA can be derived from several prospective and here we give ageometric derivation.

MLCC 2018 6

Page 7: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Principal Component Analysis

PCA is arguably the most popular dimensionality reduction procedure.

It is a data driven procedure that given an unsupervised sample

S = (x1, . . . , xn)

derive a dimensionality reduction defined by a linear map M .

PCA can be derived from several prospective and here we give ageometric derivation.

MLCC 2018 7

Page 8: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Principal Component Analysis

PCA is arguably the most popular dimensionality reduction procedure.

It is a data driven procedure that given an unsupervised sample

S = (x1, . . . , xn)

derive a dimensionality reduction defined by a linear map M .

PCA can be derived from several prospective and here we give ageometric derivation.

MLCC 2018 8

Page 9: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction by Reconstruction

Recall that, ifw ∈ RD, ‖w‖ = 1,

then (wTx)w is the orthogonal projection of x on w

MLCC 2018 9

Page 10: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction by Reconstruction

Recall that, ifw ∈ RD, ‖w‖ = 1,

then (wTx)w is the orthogonal projection of x on w

MLCC 2018 10

Page 11: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction by Reconstruction (cont.)

First, consider k = 1. The associated reconstruction error is

‖x− (wTx)w‖2

(that is how much we lose by projecting x along the direction w)

Problem:Find the direction p allowing the best reconstruction of the training set

MLCC 2018 11

Page 12: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction by Reconstruction (cont.)

First, consider k = 1. The associated reconstruction error is

‖x− (wTx)w‖2

(that is how much we lose by projecting x along the direction w)

Problem:Find the direction p allowing the best reconstruction of the training set

MLCC 2018 12

Page 13: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Dimensionality Reduction by Reconstruction (cont.)

Let SD−1 = {w ∈ RD | ‖w‖ = 1} is the sphere in D dimensions.Consider the empirical reconstruction minimization problem,

minw∈SD−1

1

n

n∑i=1

‖xi − (wTxi)w‖2.

The solution p to the above problem is called the first principalcomponent of the data

MLCC 2018 13

Page 14: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

An Equivalent Formulation

A direct computation shows that ‖xi − (wTxi)w‖2 = ‖xi‖2 − (wTxi)2

Then, problem

minw∈SD−1

1

n

n∑i=1

‖xi − (wTxi)w‖2

is equivalent to

maxw∈SD−1

1

n

n∑i=1

(wTxi)2

MLCC 2018 14

Page 15: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

An Equivalent Formulation

A direct computation shows that ‖xi − (wTxi)w‖2 = ‖xi‖2 − (wTxi)2

Then, problem

minw∈SD−1

1

n

n∑i=1

‖xi − (wTxi)w‖2

is equivalent to

maxw∈SD−1

1

n

n∑i=1

(wTxi)2

MLCC 2018 15

Page 16: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 16

Page 17: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Reconstruction and Variance

Assume the data to be centered, x̄ = 1nxi = 0, then we can interpret the

term(wTx)2

as the variance of x in the direction w.

The first PC can be seen as the direction along which the data havemaximum variance.

maxw∈SD−1

1

n

n∑i=1

(wTxi)2

MLCC 2018 17

Page 18: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Reconstruction and Variance

Assume the data to be centered, x̄ = 1nxi = 0, then we can interpret the

term(wTx)2

as the variance of x in the direction w.

The first PC can be seen as the direction along which the data havemaximum variance.

maxw∈SD−1

1

n

n∑i=1

(wTxi)2

MLCC 2018 18

Page 19: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Centering

If the data are not centered, we should consider

maxw∈SD−1

1

n

n∑i=1

(wT (xi − x̄))2 (1)

equivalent to

maxw∈SD−1

1

n

n∑i=1

(wTxci )2

with xc = x− x̄.

MLCC 2018 19

Page 20: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Centering and Reconstruction

If we consider the effect of centering to reconstruction it is easy to seethat we get

minw,b∈SD−1

1

n

n∑i=1

‖xi − ((wT (xi − b))w + b)‖2

where((wT (xi − b))w + b

is an affine (rather than an orthogonal) projection

MLCC 2018 20

Page 21: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 21

Page 22: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem

A further manipulation shows that PCA corresponds to an eigenvalueproblem.

Using the symmetry of the inner product,

1

n

n∑i=1

(wTxi)2 =

1

n

n∑i=1

wTxiwTxi =

1

n

n∑i=1

wTxixTi w = wT (

1

n

n∑i=1

xixTi )w

Then, we can consider the problem

maxw∈SD−1

wTCnw, Cn =1

n

n∑i=1

xixTi

MLCC 2018 22

Page 23: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem

A further manipulation shows that PCA corresponds to an eigenvalueproblem.

Using the symmetry of the inner product,

1

n

n∑i=1

(wTxi)2 =

1

n

n∑i=1

wTxiwTxi =

1

n

n∑i=1

wTxixTi w = wT (

1

n

n∑i=1

xixTi )w

Then, we can consider the problem

maxw∈SD−1

wTCnw, Cn =1

n

n∑i=1

xixTi

MLCC 2018 23

Page 24: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem

A further manipulation shows that PCA corresponds to an eigenvalueproblem.

Using the symmetry of the inner product,

1

n

n∑i=1

(wTxi)2 =

1

n

n∑i=1

wTxiwTxi =

1

n

n∑i=1

wTxixTi w = wT (

1

n

n∑i=1

xixTi )w

Then, we can consider the problem

maxw∈SD−1

wTCnw, Cn =1

n

n∑i=1

xixTi

MLCC 2018 24

Page 25: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

We make two observations:

I The (”covariance”) matrix Cn = 1n

∑ni=1X

TnXn is symmetric and

positive semi-definite.

I The objective function of PCA can be written as

wTCnw

wTw

the so called Rayleigh quotient.

Note that, if Cnu = λu then uTCnuuTu

= λ, since u is normalized.

Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn

MLCC 2018 25

Page 26: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

We make two observations:

I The (”covariance”) matrix Cn = 1n

∑ni=1X

TnXn is symmetric and

positive semi-definite.

I The objective function of PCA can be written as

wTCnw

wTw

the so called Rayleigh quotient.

Note that, if Cnu = λu then uTCnuuTu

= λ, since u is normalized.

Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn

MLCC 2018 26

Page 27: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

We make two observations:

I The (”covariance”) matrix Cn = 1n

∑ni=1X

TnXn is symmetric and

positive semi-definite.

I The objective function of PCA can be written as

wTCnw

wTw

the so called Rayleigh quotient.

Note that, if Cnu = λu then uTCnuuTu

= λ, since u is normalized.

Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn

MLCC 2018 27

Page 28: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

We make two observations:

I The (”covariance”) matrix Cn = 1n

∑ni=1X

TnXn is symmetric and

positive semi-definite.

I The objective function of PCA can be written as

wTCnw

wTw

the so called Rayleigh quotient.

Note that, if Cnu = λu then uTCnuuTu

= λ, since u is normalized.

Indeed, it is possible to show that the Rayleigh quotient achieves itsmaximum at a vector corresponding to the maximum eigenvalue of Cn

MLCC 2018 28

Page 29: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

Computing the first principal component of the data reduces tocomputing the biggest eigenvalue of the covariance and thecorresponding eigenvector.

Cnu = λu, Cn =1

n

n∑i=1

XTnXn

MLCC 2018 29

Page 30: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 30

Page 31: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Beyond the First Principal Component

We discuss how to consider more than one principle component (k > 1)

M : X = RD → Rk, k � D

The idea is simply to iterate the previous reasoning

MLCC 2018 31

Page 32: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Residual Reconstruction

The idea is to consider the one dimensional projection that can bestreconstruct the residuals

ri = xi − (pTxi)pi

An associated minimization problem is given by

minw∈SD−1,w⊥p

1

n

n∑i=1

‖ri − (wT ri)w‖2.

(note: the constraint w ⊥ p)

MLCC 2018 32

Page 33: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Residual Reconstruction

The idea is to consider the one dimensional projection that can bestreconstruct the residuals

ri = xi − (pTxi)pi

An associated minimization problem is given by

minw∈SD−1,w⊥p

1

n

n∑i=1

‖ri − (wT ri)w‖2.

(note: the constraint w ⊥ p)

MLCC 2018 33

Page 34: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Residual Reconstruction (cont.)

Note that for all i = 1, . . . , n,

‖ri − (wT ri)w‖2 = ‖ri‖2 − (wT ri)2 = ‖ri‖2 − (wTxi)

2

since w ⊥ p

Then, we can consider the following equivalent problem

maxw∈SD−1,w⊥p

1

n

n∑i=1

(wTxi)2 = wTCnw.

MLCC 2018 34

Page 35: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Residual Reconstruction (cont.)

Note that for all i = 1, . . . , n,

‖ri − (wT ri)w‖2 = ‖ri‖2 − (wT ri)2 = ‖ri‖2 − (wTxi)

2

since w ⊥ p

Then, we can consider the following equivalent problem

maxw∈SD−1,w⊥p

1

n

n∑i=1

(wTxi)2 = wTCnw.

MLCC 2018 35

Page 36: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem

maxw∈SD−1,w⊥p

1

n

n∑i=1

(wTxi)2 = wTCnw.

Again, we have to minimize the Rayleigh quotient of the covariancematrix with the extra constraint w ⊥ p

Similarly to before, it can be proved that the solution of the aboveproblem is given by the second eigenvector of Cn, and the correspondingeigenvalue.

MLCC 2018 36

Page 37: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem

maxw∈SD−1,w⊥p

1

n

n∑i=1

(wTxi)2 = wTCnw.

Again, we have to minimize the Rayleigh quotient of the covariancematrix with the extra constraint w ⊥ p

Similarly to before, it can be proved that the solution of the aboveproblem is given by the second eigenvector of Cn, and the correspondingeigenvalue.

MLCC 2018 37

Page 38: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA as an Eigenproblem (cont.)

Cnu = λu, Cn =1

n

n∑i=1

xixTi

The reasoning generalizes to more than two components:computation of k principal components reduces to finding k eigenvaluesand eigenvectors of Cn.

MLCC 2018 38

Page 39: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Remarks

I Computational complexity roughly O(kD2) (complexity of formingCn is O(nD2)). If we have n points in D dimensions and n� Dcan we compute PCA in less than O(nD2)?

I The dimensionality reduction induced by PCA is a linear projection.Can we generalize PCA to non linear dimensionality reduction?

MLCC 2018 39

Page 40: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Remarks

I Computational complexity roughly O(kD2) (complexity of formingCn is O(nD2)). If we have n points in D dimensions and n� Dcan we compute PCA in less than O(nD2)?

I The dimensionality reduction induced by PCA is a linear projection.Can we generalize PCA to non linear dimensionality reduction?

MLCC 2018 40

Page 41: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 41

Page 42: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Singular Value Decomposition

Consider the data matrix Xn, its singular value decomposition is given by

Xn = UΣV T

where:

I U is a n by k orthogonal matrix,

I V is a D by k orthogonal matrix,

I Σ is a diagonal matrix such that Σi,i =√λi, i = 1, . . . , k and

k ≤ min{n,D}.

The columns of U and the columns of V are the left and right singularvectors and the diagonal entries of Σ the singular values.

MLCC 2018 42

Page 43: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Singular Value Decomposition (cont.)

The SVD can be equivalently described by the equations

Cnpj = λjpj ,1

nKnuj = λjuj ,

Xnpj =√λjuj ,

1

nXT

n uj =√λjpj ,

for j = 1, . . . , d and where Cn = 1nX

TnXn and 1

nKn = 1nXnX

Tn

MLCC 2018 43

Page 44: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA and Singular Value Decomposition

If n� p we can consider the following procedure:

I form the matrix Kn, which is O(Dn2)

I find the first k eigenvectors of Kn, which is O(kn2)

I compute the principal components using

pj =1√λjXT

n uj =1√λj

n∑i=1

xiuij , j = 1, . . . , d

where u = (u1, . . . , un), This is O(knD) if we consider k principalcomponents.

MLCC 2018 44

Page 45: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Outline

PCA & Reconstruction

PCA and Maximum Variance

PCA and Associated Eigenproblem

Beyond the First Principal Component

PCA and Singular Value Decomposition

Kernel PCA

MLCC 2018 45

Page 46: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Beyond Linear Dimensionality Reduction?

By considering PCA we are implicitly assuming the data to lie on a linearsubspace....

...it is easy to think of situations where this assumption might violated.

Can we use kernels to obtain non linear generalization of PCA?

MLCC 2018 46

Page 47: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Beyond Linear Dimensionality Reduction?

By considering PCA we are implicitly assuming the data to lie on a linearsubspace....

...it is easy to think of situations where this assumption might violated.

Can we use kernels to obtain non linear generalization of PCA?

MLCC 2018 47

Page 48: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Beyond Linear Dimensionality Reduction?

By considering PCA we are implicitly assuming the data to lie on a linearsubspace....

...it is easy to think of situations where this assumption might violated.

Can we use kernels to obtain non linear generalization of PCA?

MLCC 2018 48

Page 49: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

From SVD to KPCA

Using SVD the projection of a point x on a principal component pj , forj = 1, . . . , d, is

(M(x))j = xT pj =1√λjxTXT

n uj =1√λj

n∑i=1

xTxiuij ,

Recall

Cnpj = λjpj ,1

nKnuj = λjuj ,

Xnpj =√λjuj ,

1

nXT

n uj =√λjpj ,

MLCC 2018 49

Page 50: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA and Feature Maps

(M(x))j =1√λj

n∑i=1

xTxiuij ,

What if consider a non linearfeature-map Φ : X → F , beforeperforming PCA?

(M(x))j = Φ(x)T pj =1√λj

n∑i=1

Φ(x)T Φ(xi)uij ,

where Knσj = σjuj and (Kn)i,j = Φ(x)T Φ(xj).

MLCC 2018 50

Page 51: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

PCA and Feature Maps

(M(x))j =1√λj

n∑i=1

xTxiuij ,

What if consider a non linearfeature-map Φ : X → F , beforeperforming PCA?

(M(x))j = Φ(x)T pj =1√λj

n∑i=1

Φ(x)T Φ(xi)uij ,

where Knσj = σjuj and (Kn)i,j = Φ(x)T Φ(xj).

MLCC 2018 51

Page 52: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Kernel PCA

(M(x))j = Φ(x)T pj =1√λj

n∑i=1

Φ(x)T Φ(xi)uij ,

If the feature map is defined by a positive definite kernelK : X ×X → R, then

(M(x))j =1√λj

n∑i=1

K(x, xi)uij ,

where Knσj = σjuj and (Kn)i,j = K(xi, xj).

MLCC 2018 52

Page 53: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Wrapping Up

In this class we introduced PCA as a basic tool for dimensionalityreduction. We discussed computational aspect and extensions to nonlinear dimensionality reduction (KPCA)

MLCC 2018 53

Page 54: MLCC 2018 Dimensionality Reduction and PCAlcsl.mit.edu/courses/mlcc/mlcc2018/materials/slides/MLCC... · 2018. 4. 24. · PCA as an Eigenproblem (cont.) We make two observations:

Next Class

In the next class, beyond dimensionality reduction, we ask how we candevise interpretable data models, and discuss a class of methods based onthe concept of sparsity.

MLCC 2018 54