cs7015 (deep learning) : lecture 61/71 cs7015 (deep learning) : lecture 6 eigen values, eigen...

315
1/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition Prof. Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Upload: others

Post on 13-Oct-2020

12 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

1/71

CS7015 (Deep Learning) : Lecture 6Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component

Analysis, Singular Value Decomposition

Prof. Mitesh M. Khapra

Department of Computer Science and EngineeringIndian Institute of Technology Madras

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 2: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

2/71

Module 6.1 : Eigenvalues and Eigenvectors

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 3: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

3/71

x

y

x =

[13

]A =

[1 22 1

]

Ax =

[75

]

What happens when a matrix hits avector?

The vector gets transformed into anew vector (it strays from its path)

The vector may also get scaled(elongated or shortened) in theprocess.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 4: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

3/71

x

y

x =

[13

]A =

[1 22 1

]

Ax =

[75

]

What happens when a matrix hits avector?

The vector gets transformed into anew vector (it strays from its path)

The vector may also get scaled(elongated or shortened) in theprocess.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 5: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

3/71

x

y

x =

[13

]A =

[1 22 1

]Ax =

[75

] What happens when a matrix hits avector?

The vector gets transformed into anew vector (it strays from its path)

The vector may also get scaled(elongated or shortened) in theprocess.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 6: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

3/71

x

y

x =

[13

]A =

[1 22 1

]Ax =

[75

] What happens when a matrix hits avector?

The vector gets transformed into anew vector (it strays from its path)

The vector may also get scaled(elongated or shortened) in theprocess.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 7: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

] For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 8: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

] For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 9: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 10: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 11: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 12: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

4/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

For a given square matrix A, thereexist special vectors which refuse tostray from their path.

These vectors are called eigenvectors.

More formally,

Ax = λx [direction remains the same]

The vector will only get scaled butwill not change its direction.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 13: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

5/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 14: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

5/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

So what is so special abouteigenvectors?

Why are they always in the limelight?

It turns out that several propertiesof matrices can be analyzed basedon their eigenvalues (for example, seespectral graph theory)

We will now see two cases whereeigenvalues/vectors will help us inthis course

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 15: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

5/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

So what is so special abouteigenvectors?

Why are they always in the limelight?

It turns out that several propertiesof matrices can be analyzed basedon their eigenvalues (for example, seespectral graph theory)

We will now see two cases whereeigenvalues/vectors will help us inthis course

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 16: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

5/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

So what is so special abouteigenvectors?

Why are they always in the limelight?

It turns out that several propertiesof matrices can be analyzed basedon their eigenvalues (for example, seespectral graph theory)

We will now see two cases whereeigenvalues/vectors will help us inthis course

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 17: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

5/71

x

y

x =

[11

]

A =

[1 22 1

]

Ax =

[33

]= 3

[11

]

So what is so special abouteigenvectors?

Why are they always in the limelight?

It turns out that several propertiesof matrices can be analyzed basedon their eigenvalues (for example, seespectral graph theory)

We will now see two cases whereeigenvalues/vectors will help us inthis course

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 18: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]

=

[p 1− q

1− p q

] [k1k2

]

v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 19: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]

=

[p 1− q

1− p q

] [k1k2

]

v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 20: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]

=

[p 1− q

1− p q

] [k1k2

]

v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 21: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]

=

[p 1− q

1− p q

] [k1k2

]

v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 22: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]

=

[p 1− q

1− p q

] [k1k2

]v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 23: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]=

[p 1− q

1− p q

] [k1k2

]

v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 24: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]=

[p 1− q

1− p q

] [k1k2

]v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 25: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

6/71

k1

Chinese

k2

Mexican

v(0) =

[k1k2

]

v(1) =

[pk1 + (1− q)k2(1− p)k1 + qk2

]=

[p 1− q

1− p q

] [k1k2

]v(1) = Mv(0)

v(2) = Mv(1)

= M2v(0)

In general, v(n) = Mnv(0)

Let us assume that on day 0, k1 students eatChinese food, and k2 students eat Mexican food.(Of course, no one eats in the mess!)

On each subsequent day i, a fraction p of thestudents who ate Chinese food on day (i − 1),continue to eat Chinese food on day i, and (1− p)shift to Mexican food.

Similarly a fraction q of students who ate Mexicanfood on day (i− 1) continue to eat Mexican foodon day i, and (1− q) shift to Chinese food.

The number of customers in the two restaurantsis thus given by the following series:

v(0),Mv(0),M2v(0),M

3v(0), . . .

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 26: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 27: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

This is a problem for the two restaurantowners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reacha steady state? (i.e. will the numberof customers in the two restaurants becomeconstant over time?)

Turns out they will!

Let’s see how?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 28: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

This is a problem for the two restaurantowners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reacha steady state? (i.e. will the numberof customers in the two restaurants becomeconstant over time?)

Turns out they will!

Let’s see how?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 29: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

This is a problem for the two restaurantowners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reacha steady state? (i.e. will the numberof customers in the two restaurants becomeconstant over time?)

Turns out they will!

Let’s see how?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 30: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

This is a problem for the two restaurantowners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reacha steady state? (i.e. will the numberof customers in the two restaurants becomeconstant over time?)

Turns out they will!

Let’s see how?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 31: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

7/71

k1 k2p

1− p

q

1− q

This is a problem for the two restaurantowners.

The number of patrons is changing constantly.

Or is it? Will the system eventually reacha steady state? (i.e. will the numberof customers in the two restaurants becomeconstant over time?)

Turns out they will!

Let’s see how?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 32: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

8/71

Definition

Let λ1, λ2, . . . , λn be theeigenvectors of an n× n matrixA. λ1 is called the dominanteigen value of A if

|λ1| ≥ |λi| i = 2, . . . , n

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 33: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

8/71

Definition

Let λ1, λ2, . . . , λn be theeigenvectors of an n× n matrixA. λ1 is called the dominanteigen value of A if

|λ1| ≥ |λi| i = 2, . . . , n

Definition

A matrix M is called a stochastic matrix if all theentries are positive and the sum of the elements ineach column is equal to 1.(Note that the matrix in our example is astochastic matrix)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 34: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

8/71

Definition

Let λ1, λ2, . . . , λn be theeigenvectors of an n× n matrixA. λ1 is called the dominanteigen value of A if

|λ1| ≥ |λi| i = 2, . . . , n

Theorem

The largest (dominant)eigenvalue of a stochastic matrixis 1.See proof here

Definition

A matrix M is called a stochastic matrix if all theentries are positive and the sum of the elements ineach column is equal to 1.(Note that the matrix in our example is astochastic matrix)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 35: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

8/71

Definition

Let λ1, λ2, . . . , λn be theeigenvectors of an n× n matrixA. λ1 is called the dominanteigen value of A if

|λ1| ≥ |λi| i = 2, . . . , n

Theorem

The largest (dominant)eigenvalue of a stochastic matrixis 1.See proof here

Definition

A matrix M is called a stochastic matrix if all theentries are positive and the sum of the elements ineach column is equal to 1.(Note that the matrix in our example is astochastic matrix)

Theorem

If A is a n × n square matrix with a dominanteigenvalue, then the sequence of vectors given byAv0, A

2v0, . . . , Anv0, . . . approaches a multiple of

the dominant eigenvector of A.(the theorem is slightly misstated here for ease ofexplanation)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 36: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

9/71

Let ed be the dominant eigenvector of M andλd = 1 the corresponding dominant eigenvalue

Given the previous definitions and theorems,what can you say about the sequenceMv(0),M

2v(0),M3v(0), . . . ?

There exists an n such that

v(n) = Mnv(0) = ked (some multiple of ed)

Now what happens at time step (n+ 1)?

v(n+1) = Mv(n) = M(ked) = k(Med) = k(λded) = ked

The population in the two restaurantsbecomes constant after time step n.See Proof Here

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 37: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

9/71

Let ed be the dominant eigenvector of M andλd = 1 the corresponding dominant eigenvalue

Given the previous definitions and theorems,what can you say about the sequenceMv(0),M

2v(0),M3v(0), . . . ?

There exists an n such that

v(n) = Mnv(0) = ked (some multiple of ed)

Now what happens at time step (n+ 1)?

v(n+1) = Mv(n) = M(ked) = k(Med) = k(λded) = ked

The population in the two restaurantsbecomes constant after time step n.See Proof Here

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 38: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

9/71

Let ed be the dominant eigenvector of M andλd = 1 the corresponding dominant eigenvalue

Given the previous definitions and theorems,what can you say about the sequenceMv(0),M

2v(0),M3v(0), . . . ?

There exists an n such that

v(n) = Mnv(0) = ked (some multiple of ed)

Now what happens at time step (n+ 1)?

v(n+1) = Mv(n) = M(ked) = k(Med) = k(λded) = ked

The population in the two restaurantsbecomes constant after time step n.See Proof Here

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 39: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

9/71

Let ed be the dominant eigenvector of M andλd = 1 the corresponding dominant eigenvalue

Given the previous definitions and theorems,what can you say about the sequenceMv(0),M

2v(0),M3v(0), . . . ?

There exists an n such that

v(n) = Mnv(0) = ked (some multiple of ed)

Now what happens at time step (n+ 1)?

v(n+1) = Mv(n) = M(ked) = k(Med) = k(λded) = ked

The population in the two restaurantsbecomes constant after time step n.See Proof Here

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 40: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

9/71

Let ed be the dominant eigenvector of M andλd = 1 the corresponding dominant eigenvalue

Given the previous definitions and theorems,what can you say about the sequenceMv(0),M

2v(0),M3v(0), . . . ?

There exists an n such that

v(n) = Mnv(0) = ked (some multiple of ed)

Now what happens at time step (n+ 1)?

v(n+1) = Mv(n) = M(ked) = k(Med) = k(λded) = ked

The population in the two restaurantsbecomes constant after time step n.See Proof Here

k1 k2p

1− p

q

1− q

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 41: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 42: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 43: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 44: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 45: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 46: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 =

k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 47: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 48: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 49: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1

(will explode)|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 50: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)

|λd| < 1

(will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 51: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)|λd| < 1

(will vanish)|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 52: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)|λd| < 1 (will vanish)

|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 53: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)|λd| < 1 (will vanish)|λd| = 1

(will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 54: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)|λd| < 1 (will vanish)|λd| = 1 (will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 55: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

10/71

Now instead of a stochastic matrix let us consider any square matrix A

Let p be the time step at which the sequence x0, Ax0, A2x0, . . . approaches a

multiple of ed (the dominant eigenvector of A)

Apx0 = ked

Ap+1x0 = A(Apx0) = kAed = kλded

Ap+2x0 = A(Ap+1x0) = kλdAed = kλ2ded

Ap+nx0 = k(λd)ned

In general, if λd is the dominant eigenvalue of a matrix A, what would happento the sequence x0, Ax0, A

2x0, . . . if

|λd| > 1 (will explode)|λd| < 1 (will vanish)|λd| = 1 (will reach a steady state)

(We will use this in the course at some point)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 56: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

11/71

Module 6.2 : Linear Algebra - Basic Definitions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 57: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

12/71

We will see some more examples where eigenvectors are important, but beforethat let’s revisit some basic definitions from linear algebra.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 58: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

13/71

Basis

A set of vectors ∈ Rn is called a basis, if they are linearly independent and everyvector ∈ Rn can be expressed as a linear combination of these vectors.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 59: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

13/71

Basis

A set of vectors ∈ Rn is called a basis, if they are linearly independent and everyvector ∈ Rn can be expressed as a linear combination of these vectors.

Linearly independent vectors

A set of n vectors v1, v2, . . . , vn is linearly independent if no vector in the set canbe expressed as a linear combination of the remaining n− 1 vectors.In other words, the only solution to

c1v1 + c2v2 + . . . cnvn = 0 is c1 = c2 = · · · = cn = 0(ci’s are scalars)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 60: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

14/71

x = (1, 0)

y = (0, 1)

For example consider the space R2

Now consider the vectors

x =

[10

]and y =

[01

]

Any vector

[ab

]∈ R2, can be expressed as a

linear combination of these two vectors i.e[ab

]= a

[10

]+ b

[01

]Further, x and y are linearly independent.(the only solution to c1x + c2y = 0 is c1 =c2 = 0)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 61: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

14/71

x = (1, 0)

y = (0, 1)

For example consider the space R2

Now consider the vectors

x =

[10

]and y =

[01

]

Any vector

[ab

]∈ R2, can be expressed as a

linear combination of these two vectors i.e[ab

]= a

[10

]+ b

[01

]Further, x and y are linearly independent.(the only solution to c1x + c2y = 0 is c1 =c2 = 0)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 62: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

14/71

x = (1, 0)

y = (0, 1)

For example consider the space R2

Now consider the vectors

x =

[10

]and y =

[01

]

Any vector

[ab

]∈ R2, can be expressed as a

linear combination of these two vectors i.e[ab

]= a

[10

]+ b

[01

]

Further, x and y are linearly independent.(the only solution to c1x + c2y = 0 is c1 =c2 = 0)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 63: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

14/71

x = (1, 0)

y = (0, 1)

For example consider the space R2

Now consider the vectors

x =

[10

]and y =

[01

]

Any vector

[ab

]∈ R2, can be expressed as a

linear combination of these two vectors i.e[ab

]= a

[10

]+ b

[01

]Further, x and y are linearly independent.(the only solution to c1x + c2y = 0 is c1 =c2 = 0)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 64: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 65: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 66: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 67: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 68: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 69: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

[ab

]= x1

[23

]+ x2

[57

]

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 70: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

[ab

]= x1

[23

]+ x2

[57

]

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 71: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

15/71

x = (1, 0)

y = (0, 1)

a = 2x1 + 5x2

b = 3x1 + 7x2

In fact, turns out that x and y are unit vectorsin the direction of the co-ordinate axes.

And indeed we are used to representing allvectors in R2 as a linear combination of thesetwo vectors.

But there is nothing sacrosanct about theparticular choice of x and y.

We could have chosen any 2 linearlyindependent vectors in R2 as the basis vectors.

For example, consider the linearlyindependent vectors, [2, 3]T and [5, 7]T .See how any vector [a, b]T ∈ R2 can beexpressed as a linear combination of thesetwo vectors.

We can find x1 and x2 by solving a system oflinear equations.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 72: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

16/71

u1u2

z =

[z1z2

] In general, given a set of linearly independentvectors u1, u2, . . . un ∈ Rn, we can express anyvector z ∈ Rn as a linear combination of thesevectors.

z = α1u1 + α2u2 + · · ·+ αnunz1z2...zn

= α1

u11u12

...u1n

+ α2

u21u22

...u2n

+ . . .+ αn

un1un2

...unn

z1z2...zn

=

u11 u21 . . . un1u12 u22 . . . un2

......

......

u1n u2n . . . unn

α1

α2...αn

We can now find the αis using GaussianElimination (Time Complexity: O(n3))

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 73: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

16/71

u1u2

z =

[z1z2

] In general, given a set of linearly independentvectors u1, u2, . . . un ∈ Rn, we can express anyvector z ∈ Rn as a linear combination of thesevectors.

z = α1u1 + α2u2 + · · ·+ αnun

z1z2...zn

= α1

u11u12

...u1n

+ α2

u21u22

...u2n

+ . . .+ αn

un1un2

...unn

z1z2...zn

=

u11 u21 . . . un1u12 u22 . . . un2

......

......

u1n u2n . . . unn

α1

α2...αn

We can now find the αis using GaussianElimination (Time Complexity: O(n3))

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 74: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

16/71

u1u2

z =

[z1z2

] In general, given a set of linearly independentvectors u1, u2, . . . un ∈ Rn, we can express anyvector z ∈ Rn as a linear combination of thesevectors.

z = α1u1 + α2u2 + · · ·+ αnunz1z2...zn

= α1

u11u12

...u1n

+ α2

u21u22

...u2n

+ . . .+ αn

un1un2

...unn

z1z2...zn

=

u11 u21 . . . un1u12 u22 . . . un2

......

......

u1n u2n . . . unn

α1

α2...αn

We can now find the αis using GaussianElimination (Time Complexity: O(n3))

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 75: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

16/71

u1u2

z =

[z1z2

] In general, given a set of linearly independentvectors u1, u2, . . . un ∈ Rn, we can express anyvector z ∈ Rn as a linear combination of thesevectors.

z = α1u1 + α2u2 + · · ·+ αnunz1z2...zn

= α1

u11u12

...u1n

+ α2

u21u22

...u2n

+ . . .+ αn

un1un2

...unn

z1z2...zn

=

u11 u21 . . . un1u12 u22 . . . un2

......

......

u1n u2n . . . unn

α1

α2...αn

(Basically rewriting in matrix form)

We can now find the αis using GaussianElimination (Time Complexity: O(n3))

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 76: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

16/71

u1u2

z =

[z1z2

] In general, given a set of linearly independentvectors u1, u2, . . . un ∈ Rn, we can express anyvector z ∈ Rn as a linear combination of thesevectors.

z = α1u1 + α2u2 + · · ·+ αnunz1z2...zn

= α1

u11u12

...u1n

+ α2

u21u22

...u2n

+ . . .+ αn

un1un2

...unn

z1z2...zn

=

u11 u21 . . . un1u12 u22 . . . un2

......

......

u1n u2n . . . unn

α1

α2...αn

We can now find the αis using GaussianElimination (Time Complexity: O(n3))

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 77: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 78: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 79: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 80: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 81: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 82: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 83: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 84: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

α1 = |→z |cosθ = |→z | zTu1

|→z ||u1|= zTu1

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 85: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

α1 = |→z |cosθ = |→z | zTu1

|→z ||u1|= zTu1

Similarly, α2 = zTu2.

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 86: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

17/71

u1u2

z =

[ab

]

θ|→ z|

α1α2

α1 = |→z |cosθ = |→z | zTu1

|→z ||u1|= zTu1

Similarly, α2 = zTu2.When u1 and u2 are unit vectorsalong the co-ordinate axes

z =

[ab

]= a

[10

]+ b

[01

]

Now let us see if we have orthonormal basis.

uTi uj = 0 ∀i 6= j and uTi ui = ‖ui‖2 = 1

Again we have:

z = α1u1 + α2u2 + . . .+ αnun

uT1 z = α1uT1 u1 + . . .+ αnu

T1 un

= α1

We can directly find each αi using a dotproduct between z and ui (time complexityO(N))

The total complexity will be O(N2)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 87: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

18/71

Remember

An orthogonal basis is the most convenient basis that one can hope for.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 88: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 89: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 90: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 91: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 92: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

Theorem 2

The eigenvectors of a squaresymmetric matrix areorthogonal.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 93: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

Theorem 2

The eigenvectors of a squaresymmetric matrix areorthogonal.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 94: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

Theorem 2

The eigenvectors of a squaresymmetric matrix areorthogonal.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 95: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

19/71

Theorem 1

The eigenvectors of a matrixA ∈ Rn×n having distincteigenvalues are linearlyindependent.Proof: See here

Theorem 2

The eigenvectors of a squaresymmetric matrix areorthogonal.Proof: See here

But what does any of this have to do witheigenvectors?

Turns out that the eigenvectors can form abasis.

In fact, the eigenvectors of a square symmetricmatrix are even more special.

Thus they form a very convenient basis.

Why would we want to use the eigenvectors asa basis instead of the more natural co-ordinateaxes?

We will answer this question soon.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 96: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

20/71

Module 6.3 : Eigenvalue Decomposition

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 97: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

21/71

Before proceeding let’s do a quick recap of eigenvalue decomposition.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 98: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.

Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 99: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.

Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 100: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU =

A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 101: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny

=

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 102: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 103: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 104: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xuny

λ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 105: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 106: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 107: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

22/71

Let u1, u2, . . . , un be the eigenvectors of a matrix A and let λ1, λ2, . . . , λn bethe corresponding eigenvalues.Consider a matrix U whose columns are u1, u2, . . . , un.Now

AU = A

xu1yxu2y . . .

xuny =

xAu1y

xAu2y . . .

xAuny

=

xλ1u1y

xλ2u2y . . .

xλnuny

=

xu1yxu2y . . .

xunyλ1 0 . . . 0

0 λ2...

.... . . 0

0 . . . 0 λn

= UΛ

where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 108: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 109: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

If U−1 exists, then we can write,

A = UΛU−1 [eigenvalue decomposition]

U−1AU = Λ [diagonalization of A]

Under what conditions would U−1 exist?

If the columns of U are linearly independent [See proof here]i.e. if A has n linearly independent eigenvectors.i.e. if A has n distinct eigenvalues [sufficient condition, proof : Slide 19Theorem 1]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 110: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

If U−1 exists, then we can write,

A = UΛU−1 [eigenvalue decomposition]

U−1AU = Λ [diagonalization of A]

Under what conditions would U−1 exist?

If the columns of U are linearly independent [See proof here]i.e. if A has n linearly independent eigenvectors.i.e. if A has n distinct eigenvalues [sufficient condition, proof : Slide 19Theorem 1]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 111: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

If U−1 exists, then we can write,

A = UΛU−1 [eigenvalue decomposition]

U−1AU = Λ [diagonalization of A]

Under what conditions would U−1 exist?

If the columns of U are linearly independent [See proof here]

i.e. if A has n linearly independent eigenvectors.i.e. if A has n distinct eigenvalues [sufficient condition, proof : Slide 19Theorem 1]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 112: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

If U−1 exists, then we can write,

A = UΛU−1 [eigenvalue decomposition]

U−1AU = Λ [diagonalization of A]

Under what conditions would U−1 exist?

If the columns of U are linearly independent [See proof here]i.e. if A has n linearly independent eigenvectors.

i.e. if A has n distinct eigenvalues [sufficient condition, proof : Slide 19Theorem 1]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 113: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

23/71

AU = UΛ

If U−1 exists, then we can write,

A = UΛU−1 [eigenvalue decomposition]

U−1AU = Λ [diagonalization of A]

Under what conditions would U−1 exist?

If the columns of U are linearly independent [See proof here]i.e. if A has n linearly independent eigenvectors.i.e. if A has n distinct eigenvalues [sufficient condition, proof : Slide 19Theorem 1]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 114: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

24/71

If A is symmetric then the situation is even more convenient.

The eigenvectors are orthogonal [proof : Slide 19 Theorem 2]

Further let’s assume, that the eigenvectors have been normalized [ uTi ui = 1]

Q = UTU =

← u1 →← u2 →. . .

← un →

xu1y

xu2y . . .

xuny

Each cell of the matrix, Qij is given by uTi uj

Qij = uTi uj = 0 if i 6= j

= 1 if i = j

∴ UTU = I (the identity matrix)

UT is the inverse of U (very convenient to calculate)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 115: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

24/71

If A is symmetric then the situation is even more convenient.

The eigenvectors are orthogonal [proof : Slide 19 Theorem 2]

Further let’s assume, that the eigenvectors have been normalized [ uTi ui = 1]

Q = UTU =

← u1 →← u2 →. . .

← un →

xu1y

xu2y . . .

xuny

Each cell of the matrix, Qij is given by uTi uj

Qij = uTi uj = 0 if i 6= j

= 1 if i = j

∴ UTU = I (the identity matrix)

UT is the inverse of U (very convenient to calculate)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 116: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

24/71

If A is symmetric then the situation is even more convenient.

The eigenvectors are orthogonal [proof : Slide 19 Theorem 2]

Further let’s assume, that the eigenvectors have been normalized [ uTi ui = 1]

Q = UTU =

← u1 →← u2 →. . .

← un →

xu1y

xu2y . . .

xuny

Each cell of the matrix, Qij is given by uTi uj

Qij = uTi uj = 0 if i 6= j

= 1 if i = j

∴ UTU = I (the identity matrix)

UT is the inverse of U (very convenient to calculate)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 117: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

24/71

If A is symmetric then the situation is even more convenient.

The eigenvectors are orthogonal [proof : Slide 19 Theorem 2]

Further let’s assume, that the eigenvectors have been normalized [ uTi ui = 1]

Q = UTU =

← u1 →← u2 →. . .

← un →

xu1y

xu2y . . .

xuny

Each cell of the matrix, Qij is given by uTi uj

Qij = uTi uj = 0 if i 6= j

= 1 if i = j

∴ UTU = I (the identity matrix)

UT is the inverse of U (very convenient to calculate)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 118: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

24/71

If A is symmetric then the situation is even more convenient.

The eigenvectors are orthogonal [proof : Slide 19 Theorem 2]

Further let’s assume, that the eigenvectors have been normalized [ uTi ui = 1]

Q = UTU =

← u1 →← u2 →. . .

← un →

xu1y

xu2y . . .

xuny

Each cell of the matrix, Qij is given by uTi uj

Qij = uTi uj = 0 if i 6= j

= 1 if i = j

∴ UTU = I (the identity matrix)

UT is the inverse of U (very convenient to calculate)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 119: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

25/71

Something to think about

Given the EVD, A = UΣUT ,what can you say about the sequence x0, Ax0, A

2x0, . . . in terms of the eigenvalues of A.(Hint: You should arrive at the same conclusion we saw earlier)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 120: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

26/71

Theorem (one more important property of eigenvectors)

If A is a square symmetric N ×N matrix, then the solution to the followingoptimization problem is given by the eigenvector corresponding to the largesteigenvalue of A.

maxx

xTAx

s.t ‖x‖ = 1

and the solution tomin

xxTAx

s.t ‖x‖ = 1

is given by the eigenvector corresponding to the smallest eigenvalue of A.Proof: Next slide.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 121: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

27/71

This is a constrained optimization problem that can be solved using LagrangeMultipliers:

L = xTAx− λ(xTx− 1)

∂L

∂x= 2Ax− λ(2x) = 0 => Ax = λx

Hence x must be an eigenvector of A with eigenvalue λ.

Multiplying by xT :

xTAx = λxTx = λ(since xTx = 1)

Therefore, the critical points of this constrained problem are the eigenvalues ofA.

The maximum value is the largest eigenvalue, while the minimum value is thesmallest eigenvalue.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 122: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

27/71

This is a constrained optimization problem that can be solved using LagrangeMultipliers:

L = xTAx− λ(xTx− 1)

∂L

∂x= 2Ax− λ(2x) = 0 => Ax = λx

Hence x must be an eigenvector of A with eigenvalue λ.

Multiplying by xT :

xTAx = λxTx = λ(since xTx = 1)

Therefore, the critical points of this constrained problem are the eigenvalues ofA.

The maximum value is the largest eigenvalue, while the minimum value is thesmallest eigenvalue.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 123: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

27/71

This is a constrained optimization problem that can be solved using LagrangeMultipliers:

L = xTAx− λ(xTx− 1)

∂L

∂x= 2Ax− λ(2x) = 0 => Ax = λx

Hence x must be an eigenvector of A with eigenvalue λ.

Multiplying by xT :

xTAx = λxTx = λ(since xTx = 1)

Therefore, the critical points of this constrained problem are the eigenvalues ofA.

The maximum value is the largest eigenvalue, while the minimum value is thesmallest eigenvalue.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 124: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

27/71

This is a constrained optimization problem that can be solved using LagrangeMultipliers:

L = xTAx− λ(xTx− 1)

∂L

∂x= 2Ax− λ(2x) = 0 => Ax = λx

Hence x must be an eigenvector of A with eigenvalue λ.

Multiplying by xT :

xTAx = λxTx = λ(since xTx = 1)

Therefore, the critical points of this constrained problem are the eigenvalues ofA.

The maximum value is the largest eigenvalue, while the minimum value is thesmallest eigenvalue.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 125: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

27/71

This is a constrained optimization problem that can be solved using LagrangeMultipliers:

L = xTAx− λ(xTx− 1)

∂L

∂x= 2Ax− λ(2x) = 0 => Ax = λx

Hence x must be an eigenvector of A with eigenvalue λ.

Multiplying by xT :

xTAx = λxTx = λ(since xTx = 1)

Therefore, the critical points of this constrained problem are the eigenvalues ofA.

The maximum value is the largest eigenvalue, while the minimum value is thesmallest eigenvalue.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 126: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

28/71

The story so far...

The eigenvectors corresponding to different eigenvalues are linearlyindependent.

The eigenvectors of a square symmetric matrix are orthogonal.

The eigenvectors of a square symmetric matrix can thus form a convenient basis.

We will put all of this to use.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 127: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

28/71

The story so far...

The eigenvectors corresponding to different eigenvalues are linearlyindependent.

The eigenvectors of a square symmetric matrix are orthogonal.

The eigenvectors of a square symmetric matrix can thus form a convenient basis.

We will put all of this to use.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 128: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

28/71

The story so far...

The eigenvectors corresponding to different eigenvalues are linearlyindependent.

The eigenvectors of a square symmetric matrix are orthogonal.

The eigenvectors of a square symmetric matrix can thus form a convenient basis.

We will put all of this to use.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 129: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

28/71

The story so far...

The eigenvectors corresponding to different eigenvalues are linearlyindependent.

The eigenvectors of a square symmetric matrix are orthogonal.

The eigenvectors of a square symmetric matrix can thus form a convenient basis.

We will put all of this to use.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 130: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

28/71

The story so far...

The eigenvectors corresponding to different eigenvalues are linearlyindependent.

The eigenvectors of a square symmetric matrix are orthogonal.

The eigenvectors of a square symmetric matrix can thus form a convenient basis.

We will put all of this to use.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 131: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

29/71

Module 6.4 : Principal Component Analysis and itsInterpretations

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 132: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

30/71

The story ahead...

Over the next few slides we will introduce Principal Component Analysis andsee three different interpretations of it

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 133: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

30/71

The story ahead...

Over the next few slides we will introduce Principal Component Analysis andsee three different interpretations of it

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 134: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

31/71

x

yConsider the following data

Each point (vector) here isrepresented using a linearcombination of the x and y axes(i.e. using the point’s x and yco-ordinates)

In other words we are using x and yas the basis

What if we choose a different basis?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 135: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

31/71

x

yConsider the following data

Each point (vector) here isrepresented using a linearcombination of the x and y axes(i.e. using the point’s x and yco-ordinates)

In other words we are using x and yas the basis

What if we choose a different basis?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 136: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

31/71

x

yConsider the following data

Each point (vector) here isrepresented using a linearcombination of the x and y axes(i.e. using the point’s x and yco-ordinates)

In other words we are using x and yas the basis

What if we choose a different basis?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 137: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

31/71

x

yConsider the following data

Each point (vector) here isrepresented using a linearcombination of the x and y axes(i.e. using the point’s x and yco-ordinates)

In other words we are using x and yas the basis

What if we choose a different basis?

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 138: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

32/71

x

y

u1

u2

For example, what if we use u1 andu2 as a basis instead of x and y.

We observe that all the points have avery small component in the directionof u2 (almost noise)

It seems that the same data whichwas originally in R2(x, y) can now berepresented in R1(u1) by making asmarter choice for the basis

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 139: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

32/71

x

y

u1

u2

For example, what if we use u1 andu2 as a basis instead of x and y.

We observe that all the points have avery small component in the directionof u2 (almost noise)

It seems that the same data whichwas originally in R2(x, y) can now berepresented in R1(u1) by making asmarter choice for the basis

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 140: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

32/71

x

y

u1

u2

For example, what if we use u1 andu2 as a basis instead of x and y.

We observe that all the points have avery small component in the directionof u2 (almost noise)

It seems that the same data whichwas originally in R2(x, y) can now berepresented in R1(u1) by making asmarter choice for the basis

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 141: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

33/71

x

y

u1

u2

Let’s try stating this more formally

Why do we not care about u2?

Because the variance in the data inthis direction is very small (all datapoints have almost the same value inthe u2 direction)

If we were to build a classifier ontop of this data then u2 would notcontribute to the classifier as thepoints are not distinguishable alongthis direction

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 142: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

33/71

x

y

u1

u2

Let’s try stating this more formally

Why do we not care about u2?

Because the variance in the data inthis direction is very small (all datapoints have almost the same value inthe u2 direction)

If we were to build a classifier ontop of this data then u2 would notcontribute to the classifier as thepoints are not distinguishable alongthis direction

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 143: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

33/71

x

y

u1

u2

Let’s try stating this more formally

Why do we not care about u2?

Because the variance in the data inthis direction is very small (all datapoints have almost the same value inthe u2 direction)

If we were to build a classifier ontop of this data then u2 would notcontribute to the classifier as thepoints are not distinguishable alongthis direction

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 144: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

33/71

x

y

u1

u2

Let’s try stating this more formally

Why do we not care about u2?

Because the variance in the data inthis direction is very small (all datapoints have almost the same value inthe u2 direction)

If we were to build a classifier ontop of this data then u2 would notcontribute to the classifier as thepoints are not distinguishable alongthis direction

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 145: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

34/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that

the data hashigh variance along these dimensions

Is that all?

No, there is something else that wedesire. Let’s see what.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 146: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

34/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that the data hashigh variance along these dimensions

Is that all?

No, there is something else that wedesire. Let’s see what.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 147: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

34/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that the data hashigh variance along these dimensions

Is that all?

No, there is something else that wedesire. Let’s see what.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 148: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

34/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that the data hashigh variance along these dimensions

Is that all?

No, there is something else that wedesire. Let’s see what.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 149: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

35/71

x y z

1 1 10.5 0 00.25 1 10.35 1.5 1.50.45 1 10.57 2 2.10.62 1.1 10.73 0.75 0.760.72 0.86 0.87

ρyz =

∑ni=1(yi − y)(zi − z)√∑n

i=1(yi − y)2√∑n

i=1(zi − z)2

Consider the following data

Is z adding any new informationbeyond what is already contained iny?

The two columns are highlycorrelated (or they have a highcovariance)

In other words the column zis redundant since it is linearlydependent on y.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 150: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

35/71

x y z

1 1 10.5 0 00.25 1 10.35 1.5 1.50.45 1 10.57 2 2.10.62 1.1 10.73 0.75 0.760.72 0.86 0.87

ρyz =

∑ni=1(yi − y)(zi − z)√∑n

i=1(yi − y)2√∑n

i=1(zi − z)2

Consider the following data

Is z adding any new informationbeyond what is already contained iny?

The two columns are highlycorrelated (or they have a highcovariance)

In other words the column zis redundant since it is linearlydependent on y.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 151: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

35/71

x y z

1 1 10.5 0 00.25 1 10.35 1.5 1.50.45 1 10.57 2 2.10.62 1.1 10.73 0.75 0.760.72 0.86 0.87

ρyz =

∑ni=1(yi − y)(zi − z)√∑n

i=1(yi − y)2√∑n

i=1(zi − z)2

Consider the following data

Is z adding any new informationbeyond what is already contained iny?

The two columns are highlycorrelated (or they have a highcovariance)

In other words the column zis redundant since it is linearlydependent on y.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 152: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

35/71

x y z

1 1 10.5 0 00.25 1 10.35 1.5 1.50.45 1 10.57 2 2.10.62 1.1 10.73 0.75 0.760.72 0.86 0.87

ρyz =

∑ni=1(yi − y)(zi − z)√∑n

i=1(yi − y)2√∑n

i=1(zi − z)2

Consider the following data

Is z adding any new informationbeyond what is already contained iny?

The two columns are highlycorrelated (or they have a highcovariance)

In other words the column zis redundant since it is linearlydependent on y.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 153: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

36/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that

the data has high variance along thesedimensions

the dimensions are linearlyindependent (uncorrelated)

(even better if they are orthogonalbecause that is a very convenientbasis)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 154: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

36/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that

the data has high variance along thesedimensions

the dimensions are linearlyindependent (uncorrelated)

(even better if they are orthogonalbecause that is a very convenientbasis)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 155: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

36/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that

the data has high variance along thesedimensions

the dimensions are linearlyindependent (uncorrelated)

(even better if they are orthogonalbecause that is a very convenientbasis)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 156: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

36/71

x

y

u1

u2

In general, we are interested inrepresenting the data using fewerdimensions such that

the data has high variance along thesedimensions

the dimensions are linearlyindependent (uncorrelated)

(even better if they are orthogonalbecause that is a very convenientbasis)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 157: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

37/71

Let p1, p2, · · · , pn be a set of such n linearly independent orthonormal vectors. LetP be a n× n matrix such that p1, p2, · · · , pn are the columns of P .

Let x1, x2, · · · , xm ∈ Rn be m data points and let X be a matrix such thatx1, x2, · · · , xm are the rows of this matrix. Further let us assume that the data is0-mean and unit variance.

We want to represent each xi using this new basis P .

xi = αi1p1 + αi2p2 + αi3p3 + · · ·+ αinpn

For an orthonormal basis we know that we can find these α′is using

αij = xTi pj =[← xi →

]T ↑pj↓

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 158: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

37/71

Let p1, p2, · · · , pn be a set of such n linearly independent orthonormal vectors. LetP be a n× n matrix such that p1, p2, · · · , pn are the columns of P .

Let x1, x2, · · · , xm ∈ Rn be m data points and let X be a matrix such thatx1, x2, · · · , xm are the rows of this matrix. Further let us assume that the data is0-mean and unit variance.

We want to represent each xi using this new basis P .

xi = αi1p1 + αi2p2 + αi3p3 + · · ·+ αinpn

For an orthonormal basis we know that we can find these α′is using

αij = xTi pj =[← xi →

]T ↑pj↓

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 159: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

37/71

Let p1, p2, · · · , pn be a set of such n linearly independent orthonormal vectors. LetP be a n× n matrix such that p1, p2, · · · , pn are the columns of P .

Let x1, x2, · · · , xm ∈ Rn be m data points and let X be a matrix such thatx1, x2, · · · , xm are the rows of this matrix. Further let us assume that the data is0-mean and unit variance.

We want to represent each xi using this new basis P .

xi = αi1p1 + αi2p2 + αi3p3 + · · ·+ αinpn

For an orthonormal basis we know that we can find these α′is using

αij = xTi pj =[← xi →

]T ↑pj↓

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 160: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

37/71

Let p1, p2, · · · , pn be a set of such n linearly independent orthonormal vectors. LetP be a n× n matrix such that p1, p2, · · · , pn are the columns of P .

Let x1, x2, · · · , xm ∈ Rn be m data points and let X be a matrix such thatx1, x2, · · · , xm are the rows of this matrix. Further let us assume that the data is0-mean and unit variance.

We want to represent each xi using this new basis P .

xi = αi1p1 + αi2p2 + αi3p3 + · · ·+ αinpn

For an orthonormal basis we know that we can find these α′is using

αij = xTi pj =[← xi →

]T ↑pj↓

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 161: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

38/71

In general, the transformed data x̂i is given by

x̂i =[← xTi →

] ↑ ↑p1 · · · pn↓ ↓

= xTi P

and

X̂ = XP (X̂ is the matrix of transformed points)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 162: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

38/71

In general, the transformed data x̂i is given by

x̂i =[← xTi →

] ↑ ↑p1 · · · pn↓ ↓

= xTi P

and

X̂ = XP (X̂ is the matrix of transformed points)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 163: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

39/71

Theorem:If X is a matrix such that its columns have zero mean and if X̂ = XP then thecolumns of X̂ will also have zero mean.

Proof: For any matrix A, 1TA gives us a row vector with the ith elementcontaining the sum of the ith column of A. (this is easy to see using therow-column picture of matrix multiplication).Consider

1T X̂ = 1TXP = (1TX)P

But 1TX is the row vector containing the sums of the columns of X. Thus1TX = 0. Therefore, 1T X̂ = 0.Hence the transformed matrix also has columns with sum = 0.

Theorem:XTX is a symmetric matrix.Proof: We can write (XTX)T = XT (XT )T = XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 164: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

39/71

Theorem:If X is a matrix such that its columns have zero mean and if X̂ = XP then thecolumns of X̂ will also have zero mean.Proof: For any matrix A, 1TA gives us a row vector with the ith elementcontaining the sum of the ith column of A. (this is easy to see using therow-column picture of matrix multiplication).

Consider1T X̂ = 1TXP = (1TX)P

But 1TX is the row vector containing the sums of the columns of X. Thus1TX = 0. Therefore, 1T X̂ = 0.Hence the transformed matrix also has columns with sum = 0.

Theorem:XTX is a symmetric matrix.Proof: We can write (XTX)T = XT (XT )T = XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 165: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

39/71

Theorem:If X is a matrix such that its columns have zero mean and if X̂ = XP then thecolumns of X̂ will also have zero mean.Proof: For any matrix A, 1TA gives us a row vector with the ith elementcontaining the sum of the ith column of A. (this is easy to see using therow-column picture of matrix multiplication).Consider

1T X̂ = 1TXP = (1TX)P

But 1TX is the row vector containing the sums of the columns of X. Thus1TX = 0. Therefore, 1T X̂ = 0.Hence the transformed matrix also has columns with sum = 0.

Theorem:XTX is a symmetric matrix.Proof: We can write (XTX)T = XT (XT )T = XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 166: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

39/71

Theorem:If X is a matrix such that its columns have zero mean and if X̂ = XP then thecolumns of X̂ will also have zero mean.Proof: For any matrix A, 1TA gives us a row vector with the ith elementcontaining the sum of the ith column of A. (this is easy to see using therow-column picture of matrix multiplication).Consider

1T X̂ = 1TXP = (1TX)P

But 1TX is the row vector containing the sums of the columns of X. Thus1TX = 0. Therefore, 1T X̂ = 0.Hence the transformed matrix also has columns with sum = 0.

Theorem:XTX is a symmetric matrix.

Proof: We can write (XTX)T = XT (XT )T = XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 167: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

39/71

Theorem:If X is a matrix such that its columns have zero mean and if X̂ = XP then thecolumns of X̂ will also have zero mean.Proof: For any matrix A, 1TA gives us a row vector with the ith elementcontaining the sum of the ith column of A. (this is easy to see using therow-column picture of matrix multiplication).Consider

1T X̂ = 1TXP = (1TX)P

But 1TX is the row vector containing the sums of the columns of X. Thus1TX = 0. Therefore, 1T X̂ = 0.Hence the transformed matrix also has columns with sum = 0.

Theorem:XTX is a symmetric matrix.Proof: We can write (XTX)T = XT (XT )T = XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 168: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

40/71

Definition:If X is a matrix whose columns are zero mean then Σ = 1

mXTX is the covariance

matrix. In other words each entry Σij stores the covariance between columns i andj of X.

Explanation: Let C be the covariance matrix of X. Let µi, µj denote the meansof the ith and jth column of X respectively. Then by definition of covariance, wecan write :

Cij =1

m

m∑k=1

(Xki − µi)(Xkj − µj)

=1

m

m∑k=1

XkiXkj (∵ µi = µj = 0)

=1

mXT

i Xj =1

m(XTX)ij

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 169: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

40/71

Definition:If X is a matrix whose columns are zero mean then Σ = 1

mXTX is the covariance

matrix. In other words each entry Σij stores the covariance between columns i andj of X.Explanation: Let C be the covariance matrix of X. Let µi, µj denote the meansof the ith and jth column of X respectively. Then by definition of covariance, wecan write :

Cij =1

m

m∑k=1

(Xki − µi)(Xkj − µj)

=1

m

m∑k=1

XkiXkj (∵ µi = µj = 0)

=1

mXT

i Xj =1

m(XTX)ij

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 170: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 171: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 172: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP

=1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 173: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P

= PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 174: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 175: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 176: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 177: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

41/71

X̂ = XP

Using the previous theorem & definition, we get 1mX̂

T X̂ is the covariance matrix ofthe transformed data. We can write :

1

mX̂T X̂ =

1

m(XP )

TXP =

1

mPTXTXP = PT

(1

mXTX

)P = PT ΣP

Each cell i, j of the covariance matrix 1mX̂

T X̂ stores the covariance between columns

i and j of X̂.

Ideally we want, (1

mX̂T X̂

)ij

= 0 i 6= j ( covariance = 0)(1

mX̂T X̂

)ij

6= 0 i = j ( variance 6= 0)

In other words, we want1

mX̂T X̂ = PT ΣP = D [ where D is a diagonal matrix ]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 178: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 179: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 180: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 181: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 182: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 183: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 184: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

42/71

We want,P TΣP = D

But Σ is a square matrix and P is an orthogonal matrix

Which orthogonal matrix satisfies the following condition?

P TΣP = D

In other words, which orthogonal matrix P diagonalizes Σ?

Answer: A matrix P whose columns are the eigen vectors of Σ = XTX [ByEigen Value Decomposition]

Thus, the new basis P used to transform X is the basis consisting of the eigenvectors of XTX

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 185: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

43/71

Why is this a good basis?

Because the eigen vectors of XTX are linearly independent (proof : Slide 19Theorem 1)

And because the eigen vectors of XTX are orthogonal (∵ XTX is symmetric -saw proof earlier)

This method is called Principal Component Analysis for transforming the datato a new basis where the dimensions are non-redundant (low covariance) & notnoisy (high variance)

In practice, we select only the top-k dimensions along which the variance ishigh (this will become more clear when we look at an alternalte interpretationof PCA)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 186: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

43/71

Why is this a good basis?

Because the eigen vectors of XTX are linearly independent (proof : Slide 19Theorem 1)

And because the eigen vectors of XTX are orthogonal (∵ XTX is symmetric -saw proof earlier)

This method is called Principal Component Analysis for transforming the datato a new basis where the dimensions are non-redundant (low covariance) & notnoisy (high variance)

In practice, we select only the top-k dimensions along which the variance ishigh (this will become more clear when we look at an alternalte interpretationof PCA)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 187: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

43/71

Why is this a good basis?

Because the eigen vectors of XTX are linearly independent (proof : Slide 19Theorem 1)

And because the eigen vectors of XTX are orthogonal (∵ XTX is symmetric -saw proof earlier)

This method is called Principal Component Analysis for transforming the datato a new basis where the dimensions are non-redundant (low covariance) & notnoisy (high variance)

In practice, we select only the top-k dimensions along which the variance ishigh (this will become more clear when we look at an alternalte interpretationof PCA)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 188: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

43/71

Why is this a good basis?

Because the eigen vectors of XTX are linearly independent (proof : Slide 19Theorem 1)

And because the eigen vectors of XTX are orthogonal (∵ XTX is symmetric -saw proof earlier)

This method is called Principal Component Analysis for transforming the datato a new basis where the dimensions are non-redundant (low covariance) & notnoisy (high variance)

In practice, we select only the top-k dimensions along which the variance ishigh (this will become more clear when we look at an alternalte interpretationof PCA)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 189: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

43/71

Why is this a good basis?

Because the eigen vectors of XTX are linearly independent (proof : Slide 19Theorem 1)

And because the eigen vectors of XTX are orthogonal (∵ XTX is symmetric -saw proof earlier)

This method is called Principal Component Analysis for transforming the datato a new basis where the dimensions are non-redundant (low covariance) & notnoisy (high variance)

In practice, we select only the top-k dimensions along which the variance ishigh (this will become more clear when we look at an alternalte interpretationof PCA)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 190: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

44/71

Module 6.5 : PCA : Interpretation 2

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 191: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

45/71

Given n orthogonal linearly independent vectors P = p1, p2, · · · , pn we canrepresent xi exactly as a linear combination of these vectors.

xi =n∑

j=1

αijpj [we know how to estimate α′ijs but we will come back to that later]

But we are interested only in the top-k dimensions (we want to get rid of noisy &redundant dimensions)

x̂i =

k∑j=1

αikpk

We want to select p′is such that we minimise the reconstructed error

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 192: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

45/71

Given n orthogonal linearly independent vectors P = p1, p2, · · · , pn we canrepresent xi exactly as a linear combination of these vectors.

xi =

n∑j=1

αijpj [we know how to estimate α′ijs but we will come back to that later]

But we are interested only in the top-k dimensions (we want to get rid of noisy &redundant dimensions)

x̂i =

k∑j=1

αikpk

We want to select p′is such that we minimise the reconstructed error

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 193: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

45/71

Given n orthogonal linearly independent vectors P = p1, p2, · · · , pn we canrepresent xi exactly as a linear combination of these vectors.

xi =

n∑j=1

αijpj [we know how to estimate α′ijs but we will come back to that later]

But we are interested only in the top-k dimensions (we want to get rid of noisy &redundant dimensions)

x̂i =

k∑j=1

αikpk

We want to select p′is such that we minimise the reconstructed error

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 194: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

45/71

Given n orthogonal linearly independent vectors P = p1, p2, · · · , pn we canrepresent xi exactly as a linear combination of these vectors.

xi =

n∑j=1

αijpj [we know how to estimate α′ijs but we will come back to that later]

But we are interested only in the top-k dimensions (we want to get rid of noisy &redundant dimensions)

x̂i =

k∑j=1

αikpk

We want to select p′is such that we minimise the reconstructed error

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 195: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 196: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 197: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 198: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 199: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 200: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 201: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 202: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)

=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 203: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 204: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

46/71

e =

m∑i=1

(xi − x̂i)T (xi − x̂i)

=m∑i=1

n∑j=1

αijpj −k∑

j=1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

2

=

m∑i=1

n∑j=k+1

αijpj

T n∑j=k+1

αijpj

=

m∑i=1

(αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)T (αi,k+1pk+1 + αi,k+2pk+2 + . . .+ αi,npn)

=m∑i=1

n∑j=k+1

αijpTj pjαij +

m∑i=1

n∑j=k+1

n∑L=k+1,L6=k

αijpTj pLαiL

=

m∑i=1

n∑j=k+1

α2ij (∵ pTj pj = 1, pTi pj = 0 ∀i 6= j)

=m∑i=1

n∑j=k+1

(xTi pj

)2

=m∑i=1

n∑j=k+1

(pTj xi

) (xTi pj

)=

n∑j=k+1

pTj

(m∑i=1

xixTi

)pj

=n∑

j=k+1

pTj mCpj

[∵

1

m

m∑i=1

xixTi =

XTX

m= C

]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 205: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

47/71

We want to minimize e

minpk+1,pk+2,··· ,pn

n∑j=k+1

pTj mCpj s.t. pTj pj = 1 ∀j = k + 1, k + 2, · · · , n

The solution to the above problem is given by the eigen vectors corresponding tothe smallest eigen values of C (Proof : refer Slide 26).

Thus we select P = p1, p2, · · · , pn as eigen vectors of C and retain only top-k eigenvectors to express the data [or discard the eigen vectors k + 1, · · · , n]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 206: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

47/71

We want to minimize e

minpk+1,pk+2,··· ,pn

n∑j=k+1

pTj mCpj s.t. pTj pj = 1 ∀j = k + 1, k + 2, · · · , n

The solution to the above problem is given by the eigen vectors corresponding tothe smallest eigen values of C (Proof : refer Slide 26).

Thus we select P = p1, p2, · · · , pn as eigen vectors of C and retain only top-k eigenvectors to express the data [or discard the eigen vectors k + 1, · · · , n]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 207: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

47/71

We want to minimize e

minpk+1,pk+2,··· ,pn

n∑j=k+1

pTj mCpj s.t. pTj pj = 1 ∀j = k + 1, k + 2, · · · , n

The solution to the above problem is given by the eigen vectors corresponding tothe smallest eigen values of C (Proof : refer Slide 26).

Thus we select P = p1, p2, · · · , pn as eigen vectors of C and retain only top-k eigenvectors to express the data [or discard the eigen vectors k + 1, · · · , n]

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 208: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

48/71

Key Idea

Minimize the error in reconstructing xi after projecting the data on to a new basis.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 209: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

49/71

Let’s look at the ‘Reconstruction Error’ in the context of our toy example

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 210: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 211: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 212: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 213: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 214: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]

But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 215: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

50/71

(3.3, 3)

x

y

u1u2

u1 = [1, 1] and u2 = [−1, 1] are thenew basis vectors

Let us convert them to unit vectorsu1 =

[1√2

1√2

]& u2 =

[−1√2

1√2

]

Consider the point x = [3.3, 3] in theoriginal data

α1 = xTu1 = 6.3/√

2α2 = xTu2 = −0.3/

√2

the perfect reconstruction of x isgiven by (using n = 2 dimensions)

x = α1u1 + α2u2 =[3.3 3

]But we are going to reconstruct itusing fewer (only k = 1 < ndimensions, ignoring the low varianceu2 dimension)

x̂ = α1u1 =[3.15 3.15

](reconstruction with minimum error)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 216: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

51/71

Recap

The eigen vectors of a matrix with distinct eigenvalues are linearly independent

The eigen vectors of a square symmetric matrix are orthogonal

PCA exploits this fact by representing the data using a new basis comprisingonly the top-k eigen vectors

The n − k dimensions which contribute very little to the reconstruction errorare discarded

These are also the directions along which the variance is minimum

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 217: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

51/71

Recap

The eigen vectors of a matrix with distinct eigenvalues are linearly independent

The eigen vectors of a square symmetric matrix are orthogonal

PCA exploits this fact by representing the data using a new basis comprisingonly the top-k eigen vectors

The n − k dimensions which contribute very little to the reconstruction errorare discarded

These are also the directions along which the variance is minimum

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 218: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

51/71

Recap

The eigen vectors of a matrix with distinct eigenvalues are linearly independent

The eigen vectors of a square symmetric matrix are orthogonal

PCA exploits this fact by representing the data using a new basis comprisingonly the top-k eigen vectors

The n − k dimensions which contribute very little to the reconstruction errorare discarded

These are also the directions along which the variance is minimum

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 219: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

51/71

Recap

The eigen vectors of a matrix with distinct eigenvalues are linearly independent

The eigen vectors of a square symmetric matrix are orthogonal

PCA exploits this fact by representing the data using a new basis comprisingonly the top-k eigen vectors

The n − k dimensions which contribute very little to the reconstruction errorare discarded

These are also the directions along which the variance is minimum

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 220: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

51/71

Recap

The eigen vectors of a matrix with distinct eigenvalues are linearly independent

The eigen vectors of a square symmetric matrix are orthogonal

PCA exploits this fact by representing the data using a new basis comprisingonly the top-k eigen vectors

The n − k dimensions which contribute very little to the reconstruction errorare discarded

These are also the directions along which the variance is minimum

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 221: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

52/71

Module 6.6 : PCA : Interpretation 3

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 222: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 223: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 224: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariance

the dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 225: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 226: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 227: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 228: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 229: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

53/71

We started off with the following wishlist

We are interested in representing the data using fewer dimensions such that

the dimensions have low covariancethe dimensions have high variance

So far we have paid a lot of attention to the covariance

It has indeed played a central role in all our analysis

But what about variance? Have we achieved our stated goal of high variancealong dimensions?

To answer this question we will see yet another interpretation of PCA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 230: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 231: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 232: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸

=1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 233: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 234: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 235: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 236: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 237: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

54/71

The ith dimension of the transformed data X̂ is given by

X̂i = Xpi

The variance along this dimension is given by

X̂Ti X̂i

m=

1

mpTi X

TXpi︸ ︷︷ ︸=

1

mpTi λipi [∵ pi is the eigen vector of XTX]

=1

mλi p

Ti pi︸︷︷︸=1

=λim

Thus the variance along the ith dimension (ith eigen vector of XTX) is givenby the corresponding (scaled) eigen value.

Hence, we did the right thing by discarding the dimensions (eigenvectors)corresponding to lower eigen values!

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 238: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

55/71

A Quick Summary

We have seen 3 different interpretations of PCA

It ensures that the covariance between the new dimensions is minimized

It picks up dimensions such that the data exhibits a high variance across thesedimensions

It ensures that the data can be represented using less number of dimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 239: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

55/71

A Quick Summary

We have seen 3 different interpretations of PCA

It ensures that the covariance between the new dimensions is minimized

It picks up dimensions such that the data exhibits a high variance across thesedimensions

It ensures that the data can be represented using less number of dimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 240: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

55/71

A Quick Summary

We have seen 3 different interpretations of PCA

It ensures that the covariance between the new dimensions is minimized

It picks up dimensions such that the data exhibits a high variance across thesedimensions

It ensures that the data can be represented using less number of dimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 241: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

55/71

A Quick Summary

We have seen 3 different interpretations of PCA

It ensures that the covariance between the new dimensions is minimized

It picks up dimensions such that the data exhibits a high variance across thesedimensions

It ensures that the data can be represented using less number of dimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 242: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

56/71

Module 6.7 : PCA : Practical Example

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 243: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 244: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 245: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 246: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 247: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 248: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

57/71

Consider we are given a large number ofimages of human faces (say, m images)

Each image is 100× 100 [10K dimensions]

We would like to represent and store theimages using much fewer dimensions (around50-200)

We construct a matrix X ∈ Rm×10K

Each row of the matrix corresponds to 1 image

Each image is represented using 10Kdimensions

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 249: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 250: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 251: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 252: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 253: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 254: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

58/71

X ∈ Rm×10K (as explained on the previousslide)

We retain the top 100 dimensionscorresponding to the top 100 eigen vectors ofXTX

Note that XTX is a n× n matrix so its eigenvectors will be n dimensional (n = 10K in thiscase)

We can convert each eigen vector into a 100×100 matrix and treat it as an image

Let’s see what we get

What we have plotted here are the first 16eigen vectors of XTX (basically, treating each10K dimensional eigen vector as a 100 × 100dimensional image)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 255: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 256: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 257: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

1∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 258: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

2∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 259: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

4∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 260: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

8∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 261: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

12∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 262: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

16∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 263: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

16∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 264: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

16∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 265: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

59/71

16∑i=1

α1ipi

These images are called eigenfacesand form a basis for representing anyface in our database

In other words, we can now representa given image (face) as a linearcombination of these eigen faces

In practice, we just need to storep1, p2, · · · , pk (one time storage)

Then for each image i we justneed to store the scalar valuesαi1, αi2, · · · , αik

This significantly reduces the storagecost without much loss in imagequality

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 266: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

60/71

Module 6.8 : Singular Value Decomposition

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 267: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

61/71

Let us get some more perspective on eigen vectors before moving ahead

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 268: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

62/71

Let v1, v2, · · · , vn be the eigen vectors of A and let λ1, λ2, · · · , λn becorresponding eigen values

Av1 = λ1v1, Av2 = λ2v2, · · · , Avn = λnvn

If a vector x in Rn is represented using v1, v2, · · · , vn as basis then

x =

n∑i=1

αivi

Now, Ax =

n∑i=1

αiAvi =

n∑i=1

αiλivi

The matrix multiplication reduces to a scalar multiplication if the eigen vectorsof A are used as a basis.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 269: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

62/71

Let v1, v2, · · · , vn be the eigen vectors of A and let λ1, λ2, · · · , λn becorresponding eigen values

Av1 = λ1v1, Av2 = λ2v2, · · · , Avn = λnvn

If a vector x in Rn is represented using v1, v2, · · · , vn as basis then

x =

n∑i=1

αivi

Now, Ax =n∑

i=1

αiAvi =n∑

i=1

αiλivi

The matrix multiplication reduces to a scalar multiplication if the eigen vectorsof A are used as a basis.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 270: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

62/71

Let v1, v2, · · · , vn be the eigen vectors of A and let λ1, λ2, · · · , λn becorresponding eigen values

Av1 = λ1v1, Av2 = λ2v2, · · · , Avn = λnvn

If a vector x in Rn is represented using v1, v2, · · · , vn as basis then

x =

n∑i=1

αivi

Now, Ax =

n∑i=1

αiAvi =

n∑i=1

αiλivi

The matrix multiplication reduces to a scalar multiplication if the eigen vectorsof A are used as a basis.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 271: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

62/71

Let v1, v2, · · · , vn be the eigen vectors of A and let λ1, λ2, · · · , λn becorresponding eigen values

Av1 = λ1v1, Av2 = λ2v2, · · · , Avn = λnvn

If a vector x in Rn is represented using v1, v2, · · · , vn as basis then

x =

n∑i=1

αivi

Now, Ax =

n∑i=1

αiAvi =

n∑i=1

αiλivi

The matrix multiplication reduces to a scalar multiplication if the eigen vectorsof A are used as a basis.

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 272: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1?

Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 273: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1?

Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 274: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1?

Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 275: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1? Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 276: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1? Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 277: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1? Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 278: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

63/71

So far all the discussion was centered around square matrices (A ∈ Rn×n)

What about rectangular matrices A ∈ Rm×n? Can they have eigen vectors?

Is it possible to have Am×nxn×1 = xn×1? Not possible !

The result of Am×nxn×1 is a vector belonging to Rm (whereas x ∈ Rn)

So do we miss out on the advantage that a basis of eigen vectors providesfor square matrices (i.e. converting matrix multiplications into scalarmultiplications)?

We will see the answer to this question over the next few slides

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 279: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 280: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 281: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 282: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 283: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 284: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 285: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 286: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 287: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

64/71

Note that matrix Am×n provides a transformation Rn → Rm

What if we could have pairs of vectors (v1, u1), (v2, u2), · · · , (vk, uk) such that vi ∈ Rn,ui ∈ Rm and Avi = σiui

Further let’s assume that v1, · · · , vk, · · · , vn are orthogonal & thus form a basis V in Rn

Similarly let’s assume that u1, · · · , uk, · · · , um are orthogonal & thus form a basis U in Rm

Now what if every vector x ∈ Rn is represented using the basis V

x =k∑

i=1

αivi [note we are using k instead of n ; will clarify this in a minute]

Ax =k∑

i=1

αiAvi

=

k∑i=1

αiσiui

Once again the matrix multiplication reduces to a scalar multiplication

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 288: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

65/71

Let’s look at a geometric interpretation of this

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 289: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 290: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 291: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 292: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 293: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputs

U - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 294: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 295: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

66/71

Rn

Rowspace

of A

R mColumnspaceof A

A

dim=k=rank(A)dim=k=rank(A)

Rn - Space of all vectors which can multiply with A to give Ax [ this is the space ofinputs of the function]

Rm - Space of all vectors which are outputs of the function Ax

We are interested in finding a basis U , V such that

V - basis for inputsU - basis for outputs

such that if the inputs and outputs are represented using this basis then the operationAx reduces to a scalar operation

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 296: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

67/71

What do we mean by saying that dimension of rowspace is k? If x ∈ Rn thenwhy is the dimension not n.

It means that of all the possible vectors in Rn only a subspace of vectors lyingin Rk can act as inputs to Ax and produce a non-zero output. The remainingvectors in Rn−k will produce a zero output

Hence we need only k dimensions to represent x

x =k∑

i=1

αivi

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 297: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

67/71

What do we mean by saying that dimension of rowspace is k? If x ∈ Rn thenwhy is the dimension not n.

It means that of all the possible vectors in Rn only a subspace of vectors lyingin Rk can act as inputs to Ax and produce a non-zero output. The remainingvectors in Rn−k will produce a zero output

Hence we need only k dimensions to represent x

x =k∑

i=1

αivi

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 298: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

67/71

What do we mean by saying that dimension of rowspace is k? If x ∈ Rn thenwhy is the dimension not n.

It means that of all the possible vectors in Rn only a subspace of vectors lyingin Rk can act as inputs to Ax and produce a non-zero output. The remainingvectors in Rn−k will produce a zero output

Hence we need only k dimensions to represent x

x =

k∑i=1

αivi

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 299: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

68/71

Let’s look at a way of writing this as a matrix operation

Av1 = σ1u1, Av2 = σ2u2, · · · , Avk = σkuk

Am×nVn×k = Um×k Σk×k︸ ︷︷ ︸diagonal matrix

If we have k orthogonal vectors (Vn×k) then using Gram Schmidtorthogonalization, we can find n − k more orthogonal vectors to complete thebasis for Rn [We can do the same for U]

Am×nVn×n = Um×mΣm×n

UTAV = Σ [U−1 = UT ] A = UΣV T [V −1 = V T ]

Σ is a diagonal matrix with only the first k diagonal elements as non-zero

Now the question is how do we find V , U and Σ

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 300: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

68/71

Let’s look at a way of writing this as a matrix operation

Av1 = σ1u1, Av2 = σ2u2, · · · , Avk = σkuk

Am×nVn×k = Um×k Σk×k︸ ︷︷ ︸diagonal matrix

If we have k orthogonal vectors (Vn×k) then using Gram Schmidtorthogonalization, we can find n − k more orthogonal vectors to complete thebasis for Rn [We can do the same for U]

Am×nVn×n = Um×mΣm×n

UTAV = Σ [U−1 = UT ] A = UΣV T [V −1 = V T ]

Σ is a diagonal matrix with only the first k diagonal elements as non-zero

Now the question is how do we find V , U and Σ

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 301: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

68/71

Let’s look at a way of writing this as a matrix operation

Av1 = σ1u1, Av2 = σ2u2, · · · , Avk = σkuk

Am×nVn×k = Um×k Σk×k︸ ︷︷ ︸diagonal matrix

If we have k orthogonal vectors (Vn×k) then using Gram Schmidtorthogonalization, we can find n − k more orthogonal vectors to complete thebasis for Rn [We can do the same for U]

Am×nVn×n = Um×mΣm×n

UTAV = Σ [U−1 = UT ] A = UΣV T [V −1 = V T ]

Σ is a diagonal matrix with only the first k diagonal elements as non-zero

Now the question is how do we find V , U and Σ

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 302: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

68/71

Let’s look at a way of writing this as a matrix operation

Av1 = σ1u1, Av2 = σ2u2, · · · , Avk = σkuk

Am×nVn×k = Um×k Σk×k︸ ︷︷ ︸diagonal matrix

If we have k orthogonal vectors (Vn×k) then using Gram Schmidtorthogonalization, we can find n − k more orthogonal vectors to complete thebasis for Rn [We can do the same for U]

Am×nVn×n = Um×mΣm×n

UTAV = Σ [U−1 = UT ] A = UΣV T [V −1 = V T ]

Σ is a diagonal matrix with only the first k diagonal elements as non-zero

Now the question is how do we find V , U and Σ

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 303: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like?

Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 304: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like?

Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 305: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like?

Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 306: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like?

Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 307: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like?

Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 308: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like? Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 309: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like? Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 310: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

69/71

Suppose V , U and Σ exist, then

ATA = (UΣV T )T (UΣV T )

= V ΣTUTUΣV T

ATA = V Σ2V T

What does this look like? Eigen Value decomposition of ATA

Similarly we can show that

AAT = UΣ2UT

Thus U and V are the eigen vectors of AAT and ATA respectively and Σ2 = Λwhere Λ is the diagonal matrix containing eigen values of ATA

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 311: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

70/71

A

m×n

=

↑ · · · ↑

u1 · · · uk↓ · · · ↓

m×k

σ1 . . .

σk

k×k

← v1 →...

← vk →

k×n

=

k∑i=1

σiuivTi

Theorem:

σ1u1vT1 is the best rank-1 approximation of the matrix A.

∑2i=1 σiuiv

Ti is the best

rank-2 approximation of matrix A. In general,∑k

i=1 σiuivTi is the best rank-k

approximation of matrix A. In other words, the solution to

min ‖A−B‖2F is given by :

B =U.,kΣk,kVTk,. (minimizes reconstruction error of A)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 312: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

70/71

A

m×n

=

↑ · · · ↑

u1 · · · uk↓ · · · ↓

m×k

σ1 . . .

σk

k×k

← v1 →...

← vk →

k×n

=

k∑i=1

σiuivTi

Theorem:

σ1u1vT1 is the best rank-1 approximation of the matrix A.

∑2i=1 σiuiv

Ti is the best

rank-2 approximation of matrix A. In general,∑k

i=1 σiuivTi is the best rank-k

approximation of matrix A. In other words, the solution to

min ‖A−B‖2F is given by :

B =U.,kΣk,kVTk,. (minimizes reconstruction error of A)

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 313: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

71/71

σi =√λi = singular value of A

U = left singular matrix of A

V = right singular matrix of A

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 314: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

71/71

σi =√λi = singular value of A

U = left singular matrix of A

V = right singular matrix of A

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6

Page 315: CS7015 (Deep Learning) : Lecture 61/71 CS7015 (Deep Learning) : Lecture 6 Eigen Values, Eigen Vectors, Eigen Value Decomposition, Principal Component Analysis, Singular Value Decomposition

71/71

σi =√λi = singular value of A

U = left singular matrix of A

V = right singular matrix of A

Prof. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 6